Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Oozie Spark Action on Yarn - HADOOP_CONF_DIR or YARN_CONF_DIR?

avatar
Visitor

Hello, I'm currently learning to use Spark Action with Oozie using CDH 5.8.

I'm running the workflow fine with master=local[*] and mode=client. However, it's seems very different with Yarn Client/Cluster. When I run the job, I got:

 

2016-09-20 06:04:14,028 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[master.meshiang] USER[root] GROUP[-] TOKEN[] APP[CSV] JOB[0000007-160920052847518-oozie-oozi-W] ACTION[0000007-160920052847518-oozie-oozi-W@spark-2bab] Launcher exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
java.lang.Exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
	at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
	at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:256)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:207)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

I know I have to specify HADOOP_CONFIG_DIR and YARN_CONFIG_DIR. But How and Where? 

 

What I already tried:

  1. Following the spark-opt onfiguration from : http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html#Spark_on_YARN. In the Spark Action > Options tab in Hue, I put the following configuration:

    --conf spark.yarn.historyServer.address=http://datanode1.meshiang:18088
    --conf spark.eventLog.dir=${nameNode}/user/spark/applicationHistory
    --conf spark.eventLog.enabled=true
    I don't know if this seems neccesary when this feature is already included in CDH 5.7.2 [OOZIE-2170]
  2. Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR at the oozie server node using

    export HADOOP_CONFIG_DIR=/etc/hadoop/conf
    export YARN_CONFIG_DIR=/etc/hadoop/conf
  3. Specifying HADOOP_CONFIG_DIR and YARN_CONFIG_DIR in the Spark Action spark-opts

    --conf spark.yarn.appMasterEnv.HADOOP_CONFIG_DIR=/etc/hadoop/conf
    --conf spark.yarn.appMasterEnv.YARN_CONFIG_DIR=/etc/hadoop/conf

 

PS : I'm using the Oozie, Spark and MRv1 (for running Oozie Launcher) from CDH 5.8 without changing any of its specification.

_

1 ACCEPTED SOLUTION

avatar
Mentor
Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.

View solution in original post

3 REPLIES 3

avatar
Mentor
You cannot run Spark on MR1 clusters. You will need a YARN cluster setup
first, and Oozie switched over to that, before you can attempt the Spark
action.

To migrate to YARN, please follow
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_mr_and_yarn.html#xd_583c10bfdb...

avatar
Visitor

Thank you for the response!

I'm sorry I forgot to specify that I already have Yarn on my cluster. I'm running the spark job fine using the spark-submit --master yarn --deploy-mode cluster via terminal. However when I run an oozie workflow on it, oozie failed with the error above.

 

Do you mean that I need to move my Oozie Launcher to use MRv2 / Yarn?

 

avatar
Mentor
Yes, you need to switch Oozie to submit over YARN and not MRv1. The
switching guide covers this aspect.