Support Questions
Find answers, ask questions, and share your expertise

Bigdata Continuous Delivery

New Contributor

Hi,

We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.

What we need is to know if there is any 'best practices' for:

  • Dev environment
  • Building process
  • Deploy on unit test env
  • Deploy on integration test env
  • Deploy on production

Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:

Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)

Many thanks,

Fabricio

12 REPLIES 12

Explorer

Hello Everyone,

We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.

I am not sure how to use jenkins to deploy our apps on cluster.

Can anyone explain what could be the next step?

Is jenkins supporting deployment of spark apps like it does for other apps.

Tha ks

New Contributor

Dear @Fabricio Carboni:

Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))

Thanks

Abhinav

New Contributor

Hi, Were you able to find a solution to this? We have a similar setup and I can't seem to find any examples of that.