I created a spark application that is configured thanks to a bunch of properties files that I specify at runtime with --files option of spark-submit command.
These local files are automatically copied to spark containers so that my job running in executors can read them to adjust its behavior.
Great, this works like a charm.
Now, I want to schedule this spark-submit action every hour with oozie, but couldn't find how to proceed to pass these configuration files properly to my spark job thanks to oozie... I guess I have to copy these files to HDFS and ask oozie to launch the spark action and pass it thoses hdfs files, but cannot figure out how to achieve this... Does anyone have a clue about this ? Thanks a lot for your help Sebastien
You can do this by adding --files in the spark-opts tag of your spark action.
<spark-opts>--executor-memory 20G --num-executors 50 --files hdfs://(complete hdfs path)</spark-opts>
As an alternative you could use a shell action and pass your spark submit command directly to it.