Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark extra library path in HDFS

Spark extra library path in HDFS

Rising Star

Hi, 

 

I would like to know what's the best way to deploy spark application dependencies in HDFS.

I have a main.jar depends on complex jars.

I don't want to zip all the class into a big jar file.

Instead, I would like to upload the dependencies into a directory in HDFS and using spark-submit --jar hdfs://xxx/main.jar to activate the spark job while main.jar is very very thin jar file.

Thus, I can easily upgrade part of dependencies by re-upload the specific jar file.

 

Thanks.

3 REPLIES 3

Re: Spark extra library path in HDFS

Master Collaborator

Most certainly, you want to package all of your dependencies into one .jar. Let Maven or SBT deal with it, because they'll get conflict resolution correct and do so automatically. You don't want to manually manage this in many jars.

Re: Spark extra library path in HDFS

Rising Star
Hi, Srowen. I can use gradle to package dependencies into a single jar file. However, I don't want to get a big jar file for I want to share common dependencies between different main jar files.
Highlighted

Re: Spark extra library path in HDFS

Rising Star
another reason is to share jars among users/machines by uploading it into the hdfs cluster.