Created 02-22-2017 11:39 PM
The spark batch job worked well when submitting with a shell cmd.
spark-submit --class com.raiyi.spark.smscount.batch.SmsStatBy3DayDrive \ --master yarn-cluster \ --num-executors 5 \ --driver-memory 3g \ --executor-memory 3g \ --executor-cores 1 \ --conf "spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar" \ --conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar" \ --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=1024m -XX:PermSize=256m" \ --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=1024m -XX:PermSize=256m" \ spark_demo-1.0-SNAPSHOT-shaded.jar 20170219
But I use oozie to submit spark batch job,exception happen.Here is log:
17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH<CPS>/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*, SPARK_LOG_URL_STDERR -> http://datanode3:8042/node/containerlogs/container_1487752257960_0334_02_000006/zhuj/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1487752257960_0334, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 98343005,176766064, SPARK_USER -> zhuj, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PUBLIC, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1487832656812,1487815532194, SPARK_LOG_URL_STDOUT -> http://datanode3:8042/node/containerlogs/container_1487752257960_0334_02_000006/zhuj/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://nameservice/user/zhuj/.sparkStaging/application_1487752257960_0334/spark-assembly.jar#__spark__.jar,hdfs://nameservice/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar#__app__.jar) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH<CPS>/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*, SPARK_LOG_URL_STDERR -> http://namenode:8042/node/containerlogs/container_1487752257960_0334_02_000003/zhuj/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1487752257960_0334, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 98343005,176766064, SPARK_USER -> zhuj, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PUBLIC, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1487832656812,1487815532194, SPARK_LOG_URL_STDOUT -> http://namenode:8042/node/containerlogs/container_1487752257960_0334_02_000003/zhuj/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://nameservice/user/zhuj/.sparkStaging/application_1487752257960_0334/spark-assembly.jar#__spark__.jar,hdfs://nameservice/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar#__app__.jar) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH<CPS>/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*, SPARK_LOG_URL_STDERR -> http://datanode:8042/node/containerlogs/container_1487752257960_0334_02_000004/zhuj/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1487752257960_0334, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 98343005,176766064, SPARK_USER -> zhuj, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PUBLIC, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1487832656812,1487815532194, SPARK_LOG_URL_STDOUT -> http://datanode:8042/node/containerlogs/container_1487752257960_0334_02_000004/zhuj/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://nameservice/user/zhuj/.sparkStaging/application_1487752257960_0334/spark-assembly.jar#__spark__.jar,hdfs://nameservice/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar#__app__.jar) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH<CPS>/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*, SPARK_LOG_URL_STDERR -> http://datanode2:8042/node/containerlogs/container_1487752257960_0334_02_000002/zhuj/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1487752257960_0334, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 98343005,176766064, SPARK_USER -> zhuj, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PUBLIC, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1487832656812,1487815532194, SPARK_LOG_URL_STDOUT -> http://datanode2:8042/node/containerlogs/container_1487752257960_0334_02_000002/zhuj/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://nameservice/user/zhuj/.sparkStaging/application_1487752257960_0334/spark-assembly.jar#__spark__.jar,hdfs://nameservice/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar#__app__.jar) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> /opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH<CPS>/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*, SPARK_LOG_URL_STDERR -> http://datanode0:8042/node/containerlogs/container_1487752257960_0334_02_000005/zhuj/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1487752257960_0334, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 98343005,176766064, SPARK_USER -> zhuj, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PUBLIC, SPARK_YARN_MODE -> true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1487832656812,1487815532194, SPARK_LOG_URL_STDOUT -> http://datanode0:8042/node/containerlogs/container_1487752257960_0334_02_000005/zhuj/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://nameservice/user/zhuj/.sparkStaging/application_1487752257960_0334/spark-assembly.jar#__spark__.jar,hdfs://nameservice/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar#__app__.jar) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms3072m, -Xmx3072m, '-XX:PermSize=1024m', -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.driver.port=40826', '-Dspark.ui.port=0', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://sparkDriver@datanode3:40826/user/CoarseGrainedScheduler, --executor-id, 1, --hostname, datanode2, --cores, 1, --app-id, application_1487752257960_0334, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms3072m, -Xmx3072m, '-XX:PermSize=1024m', -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.driver.port=40826', '-Dspark.ui.port=0', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://sparkDriver@datanode3:40826/user/CoarseGrainedScheduler, --executor-id, 5, --hostname, datanode3, --cores, 1, --app-id, application_1487752257960_0334, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms3072m, -Xmx3072m, '-XX:PermSize=1024m', -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.driver.port=40826', '-Dspark.ui.port=0', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://sparkDriver@datanode3:40826/user/CoarseGrainedScheduler, --executor-id, 3, --hostname, datanode, --cores, 1, --app-id, application_1487752257960_0334, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms3072m, -Xmx3072m, '-XX:PermSize=1024m', -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.driver.port=40826', '-Dspark.ui.port=0', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://sparkDriver@datanode3:40826/user/CoarseGrainedScheduler, --executor-id, 2, --hostname, namenode, --cores, 1, --app-id, application_1487752257960_0334, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 17/02/23 14:51:46 INFO yarn.ExecutorRunnable: Setting up executor with commands: List({{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms3072m, -Xmx3072m, '-XX:PermSize=1024m', -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.driver.port=40826', '-Dspark.ui.port=0', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://sparkDriver@datanode3:40826/user/CoarseGrainedScheduler, --executor-id, 4, --hostname, datanode0, --cores, 1, --app-id, application_1487752257960_0334, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 17/02/23 14:51:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : namenode:8041 17/02/23 14:51:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode:8041 17/02/23 14:51:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode2:8041 17/02/23 14:51:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode3:8041 17/02/23 14:51:46 INFO impl.ContainerManagementProtocolProxy: Opening proxy : datanode0:8041 17/02/23 14:51:49 INFO cluster.YarnClusterSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@datanode:36494/user/Executor#311844886] with ID 3 17/02/23 14:51:49 INFO storage.BlockManagerMasterActor: Registering block manager datanode:40821 with 1589.8 MB RAM, BlockManagerId(3, datanode, 40821) 17/02/23 14:51:49 INFO cluster.YarnClusterSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@datanode0:44241/user/Executor#-1406859909] with ID 4 17/02/23 14:51:49 INFO cluster.YarnClusterSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@datanode2:35227/user/Executor#66771502] with ID 1 17/02/23 14:51:49 INFO storage.BlockManagerMasterActor: Registering block manager datanode0:34517 with 1589.8 MB RAM, BlockManagerId(4, datanode0, 34517) 17/02/23 14:51:49 INFO storage.BlockManagerMasterActor: Registering block manager datanode2:59608 with 1589.8 MB RAM, BlockManagerId(1, datanode2, 59608) 17/02/23 14:51:50 INFO cluster.YarnClusterSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@datanode3:40349/user/Executor#1475870089] with ID 5 17/02/23 14:51:50 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 17/02/23 14:51:50 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 17/02/23 14:51:50 ERROR yarn.ApplicationMaster: User class threw exception: org/apache/hadoop/hive/conf/HiveConf java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at com.raiyi.spark.smscount.batch.SmsStatBy3DayDrive$.main(SmsStatBy3DayDrive.scala:87) at com.raiyi.spark.smscount.batch.SmsStatBy3DayDrive.main(SmsStatBy3DayDrive.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
How to set the spark extra options ? My oozie job.xml is here:
<workflow-app name="SmsStatBy3DayDrive" xmlns="uri:oozie:workflow:0.5"> <global> <configuration> <property> <name></name> <value></value> </property> </configuration> </global> <start to="spark-3b65"/> <kill name="Kill"> <message>操作失败,错误消息[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="spark-3b65"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn-cluster</master> <mode>cluster</mode> <name>SmsStatBy3DayDrive</name> <class>com.raiyi.spark.smscount.batch.SmsStatBy3DayDrive</class> <jar>${nameNode}/user/zhuj/batchjars/spark_demo-1.0-SNAPSHOT-shaded.jar</jar> <spark-opts>--num-executors 5 --driver-memory 3g --executor-memory 3g --executor-cores 1 --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/htrace-core-3.1.0-incubating.jar:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/conf:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hive/lib/*.jar --conf spark.driver.extraJavaOptions=-XX:PermSize=1024m --conf spark.executor.extraJavaOptions=-XX:PermSize=1024m</spark-opts> <arg>$(executeDate)</arg> </spark> <ok to="End"/> <error to="Kill"/> </action> <end name="End"/> </workflow-app>
Please help me !
Created on 02-22-2017 11:47 PM - edited 02-22-2017 11:48 PM
CDH verison is 5.4.7
oozie 0.5
spark 1.3.0
Created 03-02-2017 01:38 AM
The reason is this --conf not work.
spark.driver.extraClassPath
how to figure it out ?