Support Questions

fabricio_carbon · ‎06-06-2016

Hi,

We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.

What we need is to know if there is any 'best practices' for:

Dev environment
Building process
Deploy on unit test env
Deploy on integration test env
Deploy on production

Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:

Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)

Many thanks,

Fabricio

mlanciaux · ‎06-12-2016

Sure so basically regarding the cluster, you may find useful to

Configure queue with Capacity Scheduler (production, dev, integration, test), use elasticity and preemption
Map users to queue
You can use naming convention for queue and users by specifying -dev or -test
Depending the tool you are using you can use
- Different database names with Hive
- Different directories with HDFS + quotas
- Namespace for HBase
Ranger will help you configure the permission for each user / group to access the right resource
Each user will have different environnement settings
Use Jenkins and Maven (if needed) to build, push the code (with SSH plugin) and run the test
Use template to provide tools to the user with logging features / correct parameter and option

View solution in original post

varunjoshi · ‎09-28-2017

Hello Everyone,

We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.

I am not sure how to use jenkins to deploy our apps on cluster.

Can anyone explain what could be the next step?

Is jenkins supporting deployment of spark apps like it does for other apps.

Tha ks

soti_abhinav · ‎12-12-2018

Dear @Fabricio Carboni:

Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))

Thanks

Abhinav

c-joe_boctor · ‎02-08-2019

Hi, Were you able to find a solution to this? We have a similar setup and I can't seem to find any examples of that.

Cloudera Community

Support Questions

Bigdata Continuous Delivery

The Changing Landscape of Data Ingest, Continuous ...

At least once delivery vs exactly once delivery se...

Continuous Integration/Delivery of HDP clusters on...

General BigData Questions

yum doesn't have enough cached data to continue

Query on Delivery guarantee parameter in PublishKa...

Continuously Flow initializing after upgrade from ...

Store a Flow to Disk and Then Reserialize It to Co...

Hive Hiveserver2 is continuously exiting, it doesn...

Enable Kerberos - Setup KDC Account - Continue ena...