Created on 09-20-2016 06:34 AM - edited 09-16-2022 03:40 AM
Hello, I'm currently learning to use Spark Action with Oozie using CDH 5.8.
I'm running the workflow fine with master=local[*] and mode=client. However, it's seems very different with Yarn Client/Cluster. When I run the job, I got:
2016-09-20 06:04:14,028 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[master.meshiang] USER[root] GROUP[-] TOKEN[] APP[CSV] JOB[0000007-160920052847518-oozie-oozi-W] ACTION[0000007-160920052847518-oozie-oozi-W@spark-2bab] Launcher exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. java.lang.Exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251) at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:256) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:207) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.Child.main(Child.java:262)
I know I have to specify HADOOP_CONFIG_DIR and YARN_CONFIG_DIR. But How and Where?
What I already tried:
--conf spark.yarn.historyServer.address=http://datanode1.meshiang:18088 --conf spark.eventLog.dir=${nameNode}/user/spark/applicationHistory --conf spark.eventLog.enabled=trueI don't know if this seems neccesary when this feature is already included in CDH 5.7.2 [OOZIE-2170]
export HADOOP_CONFIG_DIR=/etc/hadoop/conf export YARN_CONFIG_DIR=/etc/hadoop/conf
--conf spark.yarn.appMasterEnv.HADOOP_CONFIG_DIR=/etc/hadoop/conf --conf spark.yarn.appMasterEnv.YARN_CONFIG_DIR=/etc/hadoop/conf
PS : I'm using the Oozie, Spark and MRv1 (for running Oozie Launcher) from CDH 5.8 without changing any of its specification.
_
Created 09-20-2016 06:55 AM
Created 09-20-2016 06:47 AM
Created on 09-20-2016 06:52 AM - edited 09-20-2016 06:54 AM
Thank you for the response!
I'm sorry I forgot to specify that I already have Yarn on my cluster. I'm running the spark job fine using the spark-submit --master yarn --deploy-mode cluster via terminal. However when I run an oozie workflow on it, oozie failed with the error above.
Do you mean that I need to move my Oozie Launcher to use MRv2 / Yarn?
Created 09-20-2016 06:55 AM