I would like to know what's the best way to deploy spark application dependencies in HDFS.
I have a main.jar depends on complex jars.
I don't want to zip all the class into a big jar file.
Instead, I would like to upload the dependencies into a directory in HDFS and using spark-submit --jar hdfs://xxx/main.jar to activate the spark job while main.jar is very very thin jar file.
Thus, I can easily upgrade part of dependencies by re-upload the specific jar file.
Most certainly, you want to package all of your dependencies into one .jar. Let Maven or SBT deal with it, because they'll get conflict resolution correct and do so automatically. You don't want to manually manage this in many jars.