Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Master Guru
Created on 10-16-2017 10:02 PM
Please follow below steps for running SparkR script via Oozie
.
1. Install R packages on all the node managers
yum -y install R R-devel libcurl-devel openssl-devel
.
2. Keep your R script ready
Here is the sample script
library(SparkR) sc <- sparkR.init(appName="SparkR-sample") sqlContext <- sparkRSQL.init(sc) localDF <- data.frame(name=c("ABC", "blah", "blah"), age=c(39, 32, 81)) df <- createDataFrame(sqlContext, localDF) printSchema(df) sparkR.stop()
.
3. Create workflow.xml
Here is the working example:
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'> <global> <configuration> <property> <name>oozie.launcher.yarn.app.mapreduce.am.env</name> <value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value> </property> <property> <name>oozie.launcher.mapred.child.env</name> <value>SPARK_HOME=/usr/hdp/2.5.3.0-37/spark</value> </property> </configuration> </global> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/> </prepare> <master>${master}</master> <name>SparkR</name> <jar>${nameNode}/user/${wf:user()}/spark.R</jar> <spark-opts>--driver-memory 512m --conf spark.driver.extraJavaOptions=-Dhdp.version=2.5.3.0</spark-opts> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
.
4. Make sure that you don't have sparkr.zip in workflow/lib directory or Oozie sharelib or in <file> tag in the workflow, or else it will cause conflicts.
.
Upload workflow to hdfs and run it. It should work. This has been successfully tested on HDP-2.5.X & HDP-2.6.X
.
Please comment if you have any feedback/questions/suggestions.
Happy Hadooping!! 
Reference - https://developer.ibm.com/hadoop/2017/06/30/scheduling-spark-job-written-pyspark-sparkr-yarn-oozie
2,176 Views