We have pyspark program which ingests source files into hdfs and then load them into other hive tables.
Program works fine on edgenode as source files are local in edgenode.Can we schedule this pyspark program through oozie with spark2-submit --deploy-mode client?
As this is an edge node, Spark 2 is not supported in Oozie actions until CDH 6, and it sounds like these source files may be located specifically on this edgenode, I would suggest using the Oozie SSH action to SSH to this specific edgenode. You would run a script in the SSH action to run the spark2-submit. Please see:
Robert Justice, Technical Resolution Manager