question Bigdata Continuous Delivery in Support Questions

Bigdata Continuous Delivery

fabricio_carbon — Fri, 16 Sep 2022 10:23:43 GMT

Hi,

We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.

What we need is to know if there is any 'best practices' for:

Dev environment
Building process
Deploy on unit test env
Deploy on integration test env
Deploy on production

Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:

Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)

Many thanks,

Fabricio

Re: Bigdata Continuous Delivery

mlanciaux — Thu, 09 Jun 2016 18:01:43 GMT

Dear Fabricio,

Yes we have several customers working on this topic. It is a interesting one. From what I have seen last time, the architecture was based on 2 real clusters, one PROD, one DR + TEST + INTEGRATION, with YARN queue and HDFS quota configured accordingly. Jenkins + SVN to take care of the versioning + build + test.

Some great team has build also their own project to validate the dev and follow the deployment across different environments.

I don't know too much about Docker, Mesos, Marathon so can't answer for this part.

Can you perhaps give me more details about what you are looking for ? What did you try ?

Kind regards.

Re: Bigdata Continuous Delivery

fabricio_carbon — Sat, 11 Jun 2016 00:48:31 GMT

Hi mlanciaux,

Thanks for your reply.

Let's put aside Docker, Mesos and Marathon. It was a way I've found to follow.

We do not have 2 clusters but something like a dev one. A small portion of the production env. So let's supose DEV + TEST + INTEGRATION on this small one.

I wonder if you could help me sharing with me some paper were I could start with. I've found lot of information and differents approaches. Is there anything Horton could recommend thinking the same way jenkins + SVN or Git.

Thanks

Fabricio

Re: Bigdata Continuous Delivery

mlanciaux — Sun, 12 Jun 2016 18:22:06 GMT

Sure so basically regarding the cluster, you may find useful to

Configure queue with Capacity Scheduler (production, dev, integration, test), use elasticity and preemption
Map users to queue
You can use naming convention for queue and users by specifying -dev or -test
Depending the tool you are using you can use
- Different database names with Hive
- Different directories with HDFS + quotas
- Namespace for HBase
Ranger will help you configure the permission for each user / group to access the right resource
Each user will have different environnement settings
Use Jenkins and Maven (if needed) to build, push the code (with SSH plugin) and run the test
Use template to provide tools to the user with logging features / correct parameter and option

Re: Bigdata Continuous Delivery

fabricio_carbon — Mon, 13 Jun 2016 23:52:30 GMT

Ok Thanks!

Regarding the cluster we are almost ok.

My concern is about last two options.

Would you have a specific documentation/configuration regarding installing Jenkins properly to deal with a Horton cluster

Re: Bigdata Continuous Delivery

mlanciaux — Tue, 14 Jun 2016 02:58:21 GMT

I think the key point is to configure the different Jenkins to be able to use the edge node via the different SSH plugin (or install it there), the rest is a matter of configuring security, backup, and choose the right number of parameter to fit your usage and switch easily from one environment to the other (dev, test, prod)

Re: Bigdata Continuous Delivery

mlanciaux — Thu, 16 Jun 2016 20:15:45 GMT

This is a simple example here :

https://community.hortonworks.com/articles/40171/simple-example-of-jenkins-hdp-integration.html, I will add more later

Re: Bigdata Continuous Delivery

mlanciaux — Thu, 16 Jun 2016 22:23:12 GMT

And don't forget to check this best practices : https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+Best+Practices

Re: Bigdata Continuous Delivery

fabricio_carbon — Fri, 17 Jun 2016 00:06:38 GMT

Thanks @mlanciaux

Re: Bigdata Continuous Delivery

mlanciaux — Mon, 20 Jun 2016 10:50:05 GMT

Dear Fabricio, I successfully make a workflow to tun from my local VM to my Hadoop remote Hadoop cluster by changing the SSH connection property. Hope that helps. Kind regards.

Re: Bigdata Continuous Delivery

varunjoshi — Thu, 28 Sep 2017 11:58:57 GMT

Hello Everyone,

We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.

I am not sure how to use jenkins to deploy our apps on cluster.

Can anyone explain what could be the next step?

Is jenkins supporting deployment of spark apps like it does for other apps.

Tha ks

Re: Bigdata Continuous Delivery

soti_abhinav — Wed, 12 Dec 2018 22:09:23 GMT

Dear @Fabricio Carboni:

Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))

Thanks

Abhinav

Re: Bigdata Continuous Delivery

c-joe_boctor — Fri, 08 Feb 2019 22:58:08 GMT

Hi, Were you able to find a solution to this? We have a similar setup and I can't seem to find any examples of that.