Support Questions

Find answers, ask questions, and share your expertise

Bigdata Continuous Delivery

avatar

Hi,

We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.

What we need is to know if there is any 'best practices' for:

  • Dev environment
  • Building process
  • Deploy on unit test env
  • Deploy on integration test env
  • Deploy on production

Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:

Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)

Many thanks,

Fabricio

1 ACCEPTED SOLUTION

avatar
Rising Star

Sure so basically regarding the cluster, you may find useful to

  • Configure queue with Capacity Scheduler (production, dev, integration, test), use elasticity and preemption
  • Map users to queue
  • You can use naming convention for queue and users by specifying -dev or -test
  • Depending the tool you are using you can use
    • Different database names with Hive
    • Different directories with HDFS + quotas
    • Namespace for HBase
  • Ranger will help you configure the permission for each user / group to access the right resource
  • Each user will have different environnement settings
  • Use Jenkins and Maven (if needed) to build, push the code (with SSH plugin) and run the test
  • Use template to provide tools to the user with logging features / correct parameter and option

View solution in original post

12 REPLIES 12

avatar
Contributor

Hello Everyone,

We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.

I am not sure how to use jenkins to deploy our apps on cluster.

Can anyone explain what could be the next step?

Is jenkins supporting deployment of spark apps like it does for other apps.

Tha ks

avatar
New Contributor

Dear @Fabricio Carboni:

Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))

Thanks

Abhinav

avatar
New Contributor

Hi, Were you able to find a solution to this? We have a similar setup and I can't seem to find any examples of that.