We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.
What we need is to know if there is any 'best practices' for:
Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:
Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)
We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.
I am not sure how to use jenkins to deploy our apps on cluster.
Can anyone explain what could be the next step?
Is jenkins supporting deployment of spark apps like it does for other apps.
Dear @Fabricio Carboni:
Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))