question Re: pyspark example? in Archives of Support Questions (Read Only)

pyspark example?

erik1 — Thu, 31 Mar 2016 11:34:22 GMT

Is there anywhere a full example of a pyspark workflow with oozie? I found examples for java spark workflows but I am not sure how to transpose them with HDP and pyspark.

Re: pyspark example?

aervits — Thu, 31 Mar 2016 11:50:10 GMT

Oozie Spark action is available in the community, Hortonworks does not provide support for spark action in HDP 2.4 or below. As soon as it's available, there will be examples of pyspark in Oozie.

Re: pyspark example?

erik1 — Fri, 01 Apr 2016 08:13:11 GMT

I did not get any errors in a job with this http://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html but it certainly is not obvious how do you use it for pyspark.

Re: pyspark example?

mramasami — Fri, 01 Apr 2016 11:48:30 GMT

@Erik Putrycz To use the pyspark , you need to copy the python file to your hdfs and specify the hdfs path of python file in the <jar> tag

"<jar>${nameNode}/user/ambari-qa/examples/apps/spark/lib/pi.py</jar>"

Also you need to export the SPARK_HOME in your hadoop-env.sh

Re: pyspark example?

aervits — Fri, 17 Feb 2017 02:00:13 GMT

@Erik Putrycz I added a pyspark workflow example https://github.com/dbist/oozie/tree/master/apps/pyspark it works in HA HDFS, RM HA, OOZIE HA, kerberos.

Re: pyspark example?

aervits — Fri, 17 Feb 2017 05:07:34 GMT

@Erik Putrycz additionally, I added a tutorial here https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html