Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie Spark Action on Yarn - HADOOP_CONF_DIR or YARN_CONF_DIR?

avatar
New Contributor

Hello, I'm currently learning to use Spark Action with Oozie using CDH 5.8.

I'm running the workflow fine with master=local[*] and mode=client. However, it's seems very different with Yarn Client/Cluster. When I run the job, I got:

 

2016-09-20 06:04:14,028 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[master.meshiang] USER[root] GROUP[-] TOKEN[] APP[CSV] JOB[0000007-160920052847518-oozie-oozi-W] ACTION[0000007-160920052847518-oozie-oozi-W@spark-2bab] Launcher exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
java.lang.Exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
	at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
	at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:256)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:207)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

I know I have to specify HADOOP_CONFIG_DIR and YARN_CONFIG_DIR. But How and Where? 

 

What I already tried:

  1. Following the spark-opt onfiguration from : http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html#Spark_on_YARN. In the Spark Action > Options tab in Hue, I put the following configuration:

    --conf spark.yarn.historyServer.address=http://datanode1.meshiang:18088
    --conf spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
    --conf spark.eventLog.enabled=true
    I don't know if this seems neccesary when this feature is already included in CDH 5.7.2 [OOZIE-2170]
  2. Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR at the oozie server node using

    export HADOOP_CONFIG_DIR=/etc/hadoop/conf
    export YARN_CONFIG_DIR=/etc/hadoop/conf
  3. Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR in the Spark Action spark-opts

    --conf spark.yarn.appMasterEnv.HADOOP_CONFIG_DIR=/etc/hadoop/conf
    --conf spark.yarn.appMasterEnv.YARN_CONFIG_DIR=/etc/hadoop/conf

 

PS : I'm using the Oozie, Spark and MRv1 (for running Oozie Launcher) from CDH 5.8 without changing any of its specification.

_

1 ACCEPTED SOLUTION

avatar
Mentor
Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.

View solution in original post

3 REPLIES 3

avatar
Mentor
You cannot run Spark on MR1 clusters. You will need a YARN cluster setup
first, and Oozie switched over to that, before you can attempt the Spark
action.

To migrate to YARN, please follow
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_mr_and_yarn.html#xd_583c10bfdb...

avatar
New Contributor

Thank you for the response!

I'm sorry I forgot to specify that I already have Yarn on my cluster. I'm running the spark job fine using the spark-submit --master yarn --deploy-mode cluster via terminal. However when I run an oozie workflow on it, oozie failed with the error above.

 

Do you mean that I need to move my Oozie Launcher to use MRv2 / Yarn?

 

avatar
Mentor
Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.