- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
pyspark example?
- Labels:
-
Apache Oozie
Created ‎03-31-2016 04:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there anywhere a full example of a pyspark workflow with oozie? I found examples for java spark workflows but I am not sure how to transpose them with HDP and pyspark.
Created ‎03-31-2016 04:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oozie Spark action is available in the community, Hortonworks does not provide support for spark action in HDP 2.4 or below. As soon as it's available, there will be examples of pyspark in Oozie.
Created ‎03-31-2016 04:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oozie Spark action is available in the community, Hortonworks does not provide support for spark action in HDP 2.4 or below. As soon as it's available, there will be examples of pyspark in Oozie.
Created ‎04-01-2016 01:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did not get any errors in a job with this http://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html but it certainly is not obvious how do you use it for pyspark.
Created ‎02-16-2017 06:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Erik Putrycz I added a pyspark workflow example https://github.com/dbist/oozie/tree/master/apps/pyspark it works in HA HDFS, RM HA, OOZIE HA, kerberos.
Created ‎02-16-2017 09:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Erik Putrycz additionally, I added a tutorial here https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-...
Created ‎04-01-2016 04:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Erik Putrycz To use the pyspark , you need to copy the python file to your hdfs and specify the hdfs path of python file in the <jar> tag
"<jar>${nameNode}/user/ambari-qa/examples/apps/spark/lib/pi.py</jar>"
Also you need to export the SPARK_HOME in your hadoop-env.sh
