Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
Super Guru

Please follow below steps for running SparkR script via Oozie


1. Install R packages on all the node managers

yum -y install R R-devel libcurl-devel openssl-devel


2. Keep your R script ready

Here is the sample script

sc <- sparkR.init(appName="SparkR-sample") 
sqlContext <- sparkRSQL.init(sc) 
localDF <- data.frame(name=c("ABC", "blah", "blah"), age=c(39, 32, 81)) 
df <- createDataFrame(sqlContext, localDF) 


3. Create workflow.xml

Here is the working example:

<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkFileCopy'> 
<start to='spark-node' /> 
<action name='spark-node'> 
<spark xmlns="uri:oozie:spark-action:0.1"> 
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/> 
<spark-opts>--driver-memory 512m --conf spark.driver.extraJavaOptions=-Dhdp.version=</spark-opts> 
<ok to="end" /> 
<error to="fail" /> 
<kill name="fail"> 
<message>Workflow failed, error 
<end name='end' /> 


4. Make sure that you don't have in workflow/lib directory or Oozie sharelib or in <file> tag in the workflow, or else it will cause conflicts.


Upload workflow to hdfs and run it. It should work. This has been successfully tested on HDP-2.5.X & HDP-2.6.X


Please comment if you have any feedback/questions/suggestions.

Happy Hadooping!! :)

Reference -

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎10-16-2017 10:02 PM
Updated by:
Top Kudoed Authors