- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Bigdata Continuous Delivery
Created on ‎06-06-2016 07:20 PM - edited ‎09-16-2022 03:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have started studies in order to implement Bigdata Continuous Delivery process. We'd like to know if someone has been implemented.
What we need is to know if there is any 'best practices' for:
- Dev environment
- Building process
- Deploy on unit test env
- Deploy on integration test env
- Deploy on production
Basically we develop: Hive, Python(spark), Shellscript, Flume, Sqoop. After all of above defined, we would like to provision these envs in containers to set a continuous integration deployment via:
Mesos + Jenkins + Marathon + Docker Containers to spin up dockers with Horton HDP 2.2.0. (same as production env)
Many thanks,
Fabricio
Created ‎06-12-2016 11:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure so basically regarding the cluster, you may find useful to
- Configure queue with Capacity Scheduler (production, dev, integration, test), use elasticity and preemption
- Map users to queue
- You can use naming convention for queue and users by specifying -dev or -test
- Depending the tool you are using you can use
- Different database names with Hive
- Different directories with HDFS + quotas
- Namespace for HBase
- Ranger will help you configure the permission for each user / group to access the right resource
- Each user will have different environnement settings
- Use Jenkins and Maven (if needed) to build, push the code (with SSH plugin) and run the test
- Use template to provide tools to the user with logging features / correct parameter and option
Created ‎09-28-2017 04:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everyone,
We are using scala with maven to build spark applications along with git as code repository and jenkins integrated with git to build the jar.
I am not sure how to use jenkins to deploy our apps on cluster.
Can anyone explain what could be the next step?
Is jenkins supporting deployment of spark apps like it does for other apps.
Tha ks
Created ‎12-12-2018 02:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear @Fabricio Carboni:
Can you please share some document on how we can implement CI/CD for pyspark based applications. Also, is it possbile to do it without using containers (like we do development in Java/Scala (first locally on windows and then build it on Linux dev/tst/Prod))
Thanks
Abhinav
Created ‎02-08-2019 02:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Were you able to find a solution to this? We have a similar setup and I can't seem to find any examples of that.

- « Previous
-
- 1
- 2
- Next »