Archives of Support Questions (Read Only)

Stefanie · ‎09-20-2016

Hello, I'm currently learning to use Spark Action with Oozie using CDH 5.8.

I'm running the workflow fine with master=local[*] and mode=client. However, it's seems very different with Yarn Client/Cluster. When I run the job, I got:

2016-09-20 06:04:14,028 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[master.meshiang] USER[root] GROUP[-] TOKEN[] APP[CSV] JOB[0000007-160920052847518-oozie-oozi-W] ACTION[0000007-160920052847518-oozie-oozi-W@spark-2bab] Launcher exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
java.lang.Exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
	at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
	at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:256)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:207)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

I know I have to specify HADOOP_CONFIG_DIR and YARN_CONFIG_DIR. But How and Where?

What I already tried:

Following the spark-opt onfiguration from : http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html#Spark_on_YARN. In the Spark Action > Options tab in Hue, I put the following configuration:
```
--conf spark.yarn.historyServer.address=http://datanode1.meshiang:18088
--conf spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
```
I don't know if this seems neccesary when this feature is already included in CDH 5.7.2 [OOZIE-2170]

Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR at the oozie server node using

export HADOOP_CONFIG_DIR=/etc/hadoop/conf
export YARN_CONFIG_DIR=/etc/hadoop/conf

Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR in the Spark Action spark-opts

--conf spark.yarn.appMasterEnv.HADOOP_CONFIG_DIR=/etc/hadoop/conf
--conf spark.yarn.appMasterEnv.YARN_CONFIG_DIR=/etc/hadoop/conf

PS : I'm using the Oozie, Spark and MRv1 (for running Oozie Launcher) from CDH 5.8 without changing any of its specification.

_

Harsh J · ‎09-20-2016

Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.

View solution in original post

Harsh J · ‎09-20-2016

You cannot run Spark on MR1 clusters. You will need a YARN cluster setup
first, and Oozie switched over to that, before you can attempt the Spark
action.

To migrate to YARN, please follow
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_mr_and_yarn.html#xd_583c10bfdb...

Stefanie · ‎09-20-2016

Thank you for the response!

I'm sorry I forgot to specify that I already have Yarn on my cluster. I'm running the spark job fine using the spark-submit --master yarn --deploy-mode cluster via terminal. However when I run an oozie workflow on it, oozie failed with the error above.

Do you mean that I need to move my Oozie Launcher to use MRv2 / Yarn?

Harsh J · ‎09-20-2016

Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.

Cloudera Community

Archives of Support Questions (Read Only)

Oozie Spark Action on Yarn - HADOOP_CONF_DIR or YARN_CONF_DIR?