Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pyspark example?

avatar
Explorer

Is there anywhere a full example of a pyspark workflow with oozie? I found examples for java spark workflows but I am not sure how to transpose them with HDP and pyspark.

1 ACCEPTED SOLUTION

avatar
Master Mentor

Oozie Spark action is available in the community, Hortonworks does not provide support for spark action in HDP 2.4 or below. As soon as it's available, there will be examples of pyspark in Oozie.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

Oozie Spark action is available in the community, Hortonworks does not provide support for spark action in HDP 2.4 or below. As soon as it's available, there will be examples of pyspark in Oozie.

avatar
Explorer

I did not get any errors in a job with this http://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html but it certainly is not obvious how do you use it for pyspark.

avatar
Master Mentor

@Erik Putrycz I added a pyspark workflow example https://github.com/dbist/oozie/tree/master/apps/pyspark it works in HA HDFS, RM HA, OOZIE HA, kerberos.

avatar
Master Mentor

avatar
Rising Star

@Erik Putrycz To use the pyspark , you need to copy the python file to your hdfs and specify the hdfs path of python file in the <jar> tag

"<jar>${nameNode}/user/ambari-qa/examples/apps/spark/lib/pi.py</jar>"

Also you need to export the SPARK_HOME in your hadoop-env.sh