Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How GIT & Jenkins are related to Hadoop/Spark jobs?

avatar
Rising Star

Hi there,

I know Jenkins & GIt in general. But, I'm not aware of how Jenkins/GIT plays role in Hadoop projects..

Please let me know your information on this. Thanks in advance.

Regards,

Jee

1 ACCEPTED SOLUTION

avatar
Super Guru

These tools are used similarly with any software SDLC, just you will be developing software being executed on a Hadoop/Spark cluster. You can still build your jars the same way and use GIT as your source code repository. You will be submitting the job for execution in a distributed cluster. However, there are pseudo clusters for development. For example you can use hadoop mini cluster: https://github.com/sakserv/hadoop-mini-clusters

A good reference on how to use this mini cluster for testing: http://www.lopakalogic.com/articles/hadoop-articles/hadoop-testing-with-minicluster/

For Spark development you could use Spark standalone.

View solution in original post

1 REPLY 1

avatar
Super Guru

These tools are used similarly with any software SDLC, just you will be developing software being executed on a Hadoop/Spark cluster. You can still build your jars the same way and use GIT as your source code repository. You will be submitting the job for execution in a distributed cluster. However, there are pseudo clusters for development. For example you can use hadoop mini cluster: https://github.com/sakserv/hadoop-mini-clusters

A good reference on how to use this mini cluster for testing: http://www.lopakalogic.com/articles/hadoop-articles/hadoop-testing-with-minicluster/

For Spark development you could use Spark standalone.