Created on 10-02-2014 10:05 AM - edited 09-16-2022 02:08 AM
Hi all,
We are running Spark on a Kerberized CDH 5.1.3 cluster managed by CM 5.1.3. We are not able to execute some simple spark-shell commands:
[root@clouderamain ~]# source /etc/spark/conf/spark-env.sh [root@clouderamain ~]# export SPARK_PRINT_LAUNCH_COMMAND=1 [root@clouderamain ~]# spark-shell --verbose --master yarn-client scala> sc.setLocalProperty("yarn.nodemanager.delete.debug-delay-sec", "36000") scala> val textFile = sc.textFile("salaries.java") scala> textFile.count()
We get the following error upon execution:
WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
The SparkPi example runs without issue. Here are the full logs from our failing job:
[root@clouderamain lib]# spark-shell --verbose --master yarn-client Spark Command: /usr/java/default/bin/java -cp ::/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/ spark/assembly/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/examples/lib/*:/etc/hadoop/conf:/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.1.3-1.cd h5.1.3.p0.12/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH-5. 1.3-1.cdh5.1.3.p0.12/bin/../lib/hadoop/../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/hadoop/../hadoop-hdfs/lib/*:/opt/cloudera/parcel s/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/hadoop/../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/hadoop/../hadoop-yarn/lib/*:/opt/cloud era/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/hadoop/../hadoop-yarn/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/lib/*:/opt/cloudera/parcels/CDH/lib/hadoo p-mapreduce/.//*:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/scal a-compiler.jar:/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/lib/jline.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark. deploy.SparkSubmit spark-shell --verbose --master yarn-client --class org.apache.spark.repl.Main ======================================== Using properties file: /opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf/spark-defaults.conf Adding default property: spark.eventLog.enabled=true Adding default property: spark.eventLog.dir=/user/spark/applicationHistory Adding default property: spark.master=spark://clouderamain.cluster.local:7077 Using properties file: /opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf/spark-defaults.conf Adding default property: spark.eventLog.enabled=true Adding default property: spark.eventLog.dir=/user/spark/applicationHistory Adding default property: spark.master=spark://clouderamain.cluster.local:7077 Parsed arguments: master yarn-client deployMode null executorMemory null executorCores null totalExecutorCores null propertiesFile /opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf/spark-defaults.conf driverMemory null driverCores null driverExtraClassPath null driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors null files null pyFiles null archives null mainClass org.apache.spark.repl.Main primaryResource spark-shell name org.apache.spark.repl.Main childArgs [] jars null verbose true Default properties from /opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf/spark-defaults.conf: spark.eventLog.enabled -> true spark.eventLog.dir -> /user/spark/applicationHistory spark.master -> spark://clouderamain.cluster.local:7077 Using properties file: /opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark/conf/spark-defaults.conf Adding default property: spark.eventLog.enabled=true Adding default property: spark.eventLog.dir=/user/spark/applicationHistory Adding default property: spark.master=spark://clouderamain.cluster.local:7077 Main class: org.apache.spark.repl.Main Arguments: System properties: spark.eventLog.enabled -> true SPARK_SUBMIT -> true spark.app.name -> org.apache.spark.repl.Main spark.jars -> spark.eventLog.dir -> /user/spark/applicationHistory spark.master -> yarn-client Classpath elements: 14/10/02 12:21:11 INFO SecurityManager: Changing view acls to: root 14/10/02 12:21:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) 14/10/02 12:21:11 INFO HttpServer: Starting HTTP Server Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.0.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) Type in expressions to have them evaluated. Type :help for more information. 14/10/02 12:21:17 INFO SecurityManager: Changing view acls to: root 14/10/02 12:21:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) 14/10/02 12:21:18 INFO Slf4jLogger: Slf4jLogger started 14/10/02 12:21:18 INFO Remoting: Starting remoting 14/10/02 12:21:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@clouderamain.cluster.local:40136] 14/10/02 12:21:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@clouderamain.cluster.local:40136] 14/10/02 12:21:18 INFO SparkEnv: Registering MapOutputTracker 14/10/02 12:21:18 INFO SparkEnv: Registering BlockManagerMaster 14/10/02 12:21:18 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141002122118-5a5c 14/10/02 12:21:18 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/10/02 12:21:18 INFO ConnectionManager: Bound socket to port 44633 with id = ConnectionManagerId(clouderamain.cluster.local,44633) 14/10/02 12:21:18 INFO BlockManagerMaster: Trying to register BlockManager 14/10/02 12:21:18 INFO BlockManagerInfo: Registering block manager clouderamain.cluster.local:44633 with 294.9 MB RAM 14/10/02 12:21:18 INFO BlockManagerMaster: Registered BlockManager 14/10/02 12:21:19 INFO HttpServer: Starting HTTP Server 14/10/02 12:21:19 INFO HttpBroadcast: Broadcast server started at http://111.111.168.96:49465 14/10/02 12:21:19 INFO HttpFileServer: HTTP File server directory is /tmp/spark-468b7112-def1-42f7-ba3d-166cf09f919c 14/10/02 12:21:19 INFO HttpServer: Starting HTTP Server 14/10/02 12:21:19 INFO SparkUI: Started SparkUI at http://clouderamain.cluster.local:4040 14/10/02 12:21:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/10/02 12:21:23 INFO EventLoggingListener: Logging events to /user/spark/applicationHistory/spark-shell-1412266881060 --args is deprecated. Use --arg instead. 14/10/02 12:21:24 INFO RMProxy: Connecting to ResourceManager at clouderahost1.cluster.local/111.111.168.97:8032 14/10/02 12:21:24 INFO Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 3 14/10/02 12:21:24 INFO Client: Queue info ... queueName: root.default, queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/10/02 12:21:24 INFO Client: Max mem capabililty of a single resource in this cluster 4096 14/10/02 12:21:24 INFO Client: Preparing Local resources 14/10/02 12:21:24 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 7996 for cloudera on ha-hdfs:nameservice1 14/10/02 12:21:24 INFO Client: Uploading hdfs://nameservice1:8020/user/spark/share/lib/spark-assembly.jar to hdfs://nameservice1/user/cloudera/.sparkStaging/application_1412014410679_0011/spark-assembly.jar 14/10/02 12:21:31 INFO Client: Setting up the launch environment 14/10/02 12:21:31 INFO Client: Setting up container launch context 14/10/02 12:21:32 INFO Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.tachyonStore.folderName=\"spark-edeaaf88-58ab-4e56-b072-e63837686234\", -Dspark.eventLog.enabled=\"true\", -Dspark.yarn.secondary.jars=\"\", -Dspark.home=\"/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/spark\", -Dspark.repl.class.uri=\"http://111.111.168.96:58885\", -Dspark.driver.host=\"clouderamain.cluster.local\", -Dspark.driver.appUIHistoryAddress=\"\", -Dspark.app.name=\"Spark shell\", -Dspark.jars=\"\", -Dspark.fileserver.uri=\"http://111.111.168.96:33361\", -Dspark.eventLog.dir=\"/user/spark/applicationHistory\", -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"40136\", -Dspark.httpBroadcast.uri=\"http://111.111.168.96:49465\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , null, --args 'clouderamain.cluster.local:40136' , --executor-memory, 1024, --executor-cores, 1, --num-executors , 2, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 14/10/02 12:21:32 INFO Client: Submitting application to ASM 14/10/02 12:21:32 INFO YarnClientImpl: Submitted application application_1412014410679_0011 14/10/02 12:21:32 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:33 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:34 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:35 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:36 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:37 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:38 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:39 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:40 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:41 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:42 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:43 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:44 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:45 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:46 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:47 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:48 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:49 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:50 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:51 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:52 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: -1 appStartTime: 1412266892078 yarnAppState: ACCEPTED 14/10/02 12:21:53 INFO YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: 0 appStartTime: 1412266892078 yarnAppState: RUNNING 14/10/02 12:21:55 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/10/02 12:21:55 INFO SparkILoop: Created spark context.. Spark context available as sc. scala> 14/10/02 12:22:17 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderahost1.cluster.local:56255/user/Executor#-475572516] with ID 2 14/10/02 12:22:18 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderamain.cluster.local:51361/user/Executor#378528362] with ID 1 14/10/02 12:22:18 INFO BlockManagerInfo: Registering block manager clouderahost1.cluster.local:58034 with 589.2 MB RAM 14/10/02 12:22:18 INFO BlockManagerInfo: Registering block manager clouderamain.cluster.local:58125 with 589.2 MB RAM sc.setLocalProperty("yarn.nodemanager.delete.debug-delay-sec", "36000") scala> val textFile = sc.textFile("salaries.java") 14/10/02 12:24:17 INFO MemoryStore: ensureFreeSpace(240695) called with curMem=0, maxMem=309225062 14/10/02 12:24:17 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 235.1 KB, free 294.7 MB) textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> textFile.count() 14/10/02 12:24:19 INFO FileInputFormat: Total input paths to process : 1 14/10/02 12:24:19 INFO SparkContext: Starting job: count at <console>:15 14/10/02 12:24:19 INFO DAGScheduler: Got job 0 (count at <console>:15) with 2 output partitions (allowLocal=false) 14/10/02 12:24:19 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15) 14/10/02 12:24:19 INFO DAGScheduler: Parents of final stage: List() 14/10/02 12:24:19 INFO DAGScheduler: Missing parents: List() 14/10/02 12:24:19 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at textFile at <console>:12), which has no missing parents 14/10/02 12:24:19 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at textFile at <console>:12) 14/10/02 12:24:19 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 14/10/02 12:24:20 INFO RackResolver: Resolved clouderamain.cluster.local to /default 14/10/02 12:24:20 INFO RackResolver: Resolved clouderahost1.cluster.local to /default 14/10/02 12:24:20 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: clouderamain.cluster.local (NODE_LOCAL) 14/10/02 12:24:20 INFO TaskSetManager: Serialized task 0.0:0 as 1711 bytes in 4 ms 14/10/02 12:24:20 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2: clouderahost1.cluster.local (NODE_LOCAL) 14/10/02 12:24:20 INFO TaskSetManager: Serialized task 0.0:1 as 1711 bytes in 0 ms 14/10/02 12:24:20 WARN TaskSetManager: Lost TID 1 (task 0.0:1) 14/10/02 12:24:20 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2397) at java.lang.Class.getDeclaredField(Class.java:1946) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/02 12:24:20 INFO TaskSetManager: Starting task 0.0:1 as TID 2 on executor 1: clouderamain.cluster.local (NODE_LOCAL) 14/10/02 12:24:20 INFO TaskSetManager: Serialized task 0.0:1 as 1711 bytes in 1 ms 14/10/02 12:24:20 WARN TaskSetManager: Lost TID 0 (task 0.0:0) 14/10/02 12:24:20 INFO TaskSetManager: Loss was due to java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf [duplicate 1] 14/10/02 12:24:20 INFO TaskSetManager: Starting task 0.0:0 as TID 3 on executor 2: clouderahost1.cluster.local (NODE_LOCAL) 14/10/02 12:24:20 INFO TaskSetManager: Serialized task 0.0:0 as 1711 bytes in 1 ms 14/10/02 12:24:21 INFO YarnClientSchedulerBackend: Executor 1 disconnected, so removing it 14/10/02 12:24:21 ERROR YarnClientClusterScheduler: Lost executor 1 on clouderamain.cluster.local: remote Akka client disassociated 14/10/02 12:24:21 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 14/10/02 12:24:21 WARN TaskSetManager: Lost TID 2 (task 0.0:1) 14/10/02 12:24:21 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so removing it 14/10/02 12:24:21 ERROR YarnClientClusterScheduler: Lost executor 2 on clouderahost1.cluster.local: remote Akka client disassociated 14/10/02 12:24:21 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0 14/10/02 12:24:21 WARN TaskSetManager: Lost TID 3 (task 0.0:0) 14/10/02 12:24:21 INFO DAGScheduler: Executor lost: 1 (epoch 0) 14/10/02 12:24:21 INFO BlockManagerMasterActor: Trying to remove executor 1 from BlockManagerMaster. 14/10/02 12:24:21 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor 14/10/02 12:24:21 INFO DAGScheduler: Executor lost: 2 (epoch 1) 14/10/02 12:24:21 INFO BlockManagerMasterActor: Trying to remove executor 2 from BlockManagerMaster. 14/10/02 12:24:21 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor 14/10/02 12:24:38 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderahost1.cluster.local:60965/user/Executor#560179254] with ID 4 14/10/02 12:24:38 INFO TaskSetManager: Starting task 0.0:0 as TID 4 on executor 4: clouderahost1.cluster.local (PROCESS_LOCAL) 14/10/02 12:24:38 INFO TaskSetManager: Serialized task 0.0:0 as 1711 bytes in 0 ms 14/10/02 12:24:38 INFO BlockManagerInfo: Registering block manager clouderahost1.cluster.local:47072 with 589.2 MB RAM 14/10/02 12:24:39 INFO TaskSetManager: Starting task 0.0:1 as TID 5 on executor 4: clouderahost1.cluster.local (PROCESS_LOCAL) 14/10/02 12:24:39 INFO TaskSetManager: Serialized task 0.0:1 as 1711 bytes in 0 ms 14/10/02 12:24:39 WARN TaskSetManager: Lost TID 4 (task 0.0:0) 14/10/02 12:24:39 WARN TaskSetManager: Loss was due to java.lang.NoClassDefFoundError java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2397) at java.lang.Class.getDeclaredField(Class.java:1946) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/02 12:24:39 INFO YarnClientSchedulerBackend: Executor 4 disconnected, so removing it 14/10/02 12:24:39 ERROR YarnClientClusterScheduler: Lost executor 4 on clouderahost1.cluster.local: remote Akka client disassociated 14/10/02 12:24:39 INFO TaskSetManager: Re-queueing tasks for 4 from TaskSet 0.0 14/10/02 12:24:39 WARN TaskSetManager: Lost TID 5 (task 0.0:1) 14/10/02 12:24:39 INFO DAGScheduler: Executor lost: 4 (epoch 2) 14/10/02 12:24:39 INFO BlockManagerMasterActor: Trying to remove executor 4 from BlockManagerMaster. 14/10/02 12:24:39 INFO BlockManagerMaster: Removed 4 successfully in removeExecutor 14/10/02 12:24:39 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderamain.cluster.local:41804/user/Executor#-1065411697] with ID 3 14/10/02 12:24:39 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on executor 3: clouderamain.cluster.local (PROCESS_LOCAL) 14/10/02 12:24:39 INFO TaskSetManager: Serialized task 0.0:1 as 1711 bytes in 0 ms 14/10/02 12:24:39 INFO BlockManagerInfo: Registering block manager clouderamain.cluster.local:41099 with 589.2 MB RAM 14/10/02 12:24:40 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on executor 3: clouderamain.cluster.local (PROCESS_LOCAL) 14/10/02 12:24:40 INFO TaskSetManager: Serialized task 0.0:0 as 1711 bytes in 0 ms 14/10/02 12:24:40 WARN TaskSetManager: Lost TID 6 (task 0.0:1) 14/10/02 12:24:40 INFO TaskSetManager: Loss was due to java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf [duplicate 1] 14/10/02 12:24:40 ERROR TaskSetManager: Task 0.0:1 failed 4 times; aborting job 14/10/02 12:24:40 INFO TaskSetManager: Loss was due to java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf [duplicate 2] 14/10/02 12:24:40 INFO YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/10/02 12:24:40 INFO DAGScheduler: Failed to run count at <console>:15 14/10/02 12:24:40 INFO YarnClientClusterScheduler: Cancelling stage 0 14/10/02 12:24:41 INFO YarnClientSchedulerBackend: Executor 3 disconnected, so removing it 14/10/02 12:24:41 ERROR YarnClientClusterScheduler: Lost executor 3 on clouderamain.cluster.local: remote Akka client disassociated 14/10/02 12:24:41 INFO DAGScheduler: Executor lost: 3 (epoch 3) 14/10/02 12:24:41 INFO BlockManagerMasterActor: Trying to remove executor 3 from BlockManagerMaster. 14/10/02 12:24:41 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 4 times, most recent failure: Exception failure in TID 6 on host clouderamain.cluster.local: java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf java.lang.Class.getDeclaredFields0(Native Method) java.lang.Class.privateGetDeclaredFields(Class.java:2397) java.lang.Class.getDeclaredField(Class.java:1946) java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72) java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480) java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) java.security.AccessController.doPrivileged(Native Method) java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) scala.collection.immutable.$colon$colon.readObject(List.scala:362) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61) org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141) java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) scala> 14/10/02 12:24:55 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderahost1.cluster.local:59671/user/Executor#-585651425] with ID 6 14/10/02 12:24:55 INFO BlockManagerInfo: Registering block manager clouderahost1.cluster.local:40622 with 589.2 MB RAM 14/10/02 12:24:58 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@clouderamain.cluster.local:44204/user/Executor#907271125] with ID 5 14/10/02 12:24:58 INFO BlockManagerInfo: Registering block manager clouderamain.cluster.local:37710 with 589.2 MB RAM
And here's the launch information from the container executor:
#!/bin/bash export SPARK_YARN_MODE="true" export SPARK_YARN_STAGING_DIR=".sparkStaging/application_1412014410679_0011/" export SPARK_YARN_CACHE_FILES_VISIBILITIES="PRIVATE" export JAVA_HOME="/usr/java/jdk1.7.0_55-cloudera" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=^M " export HADOOP_YARN_HOME="/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop-yarn" export NM_HOST="clouderahost1.cluster.local" export JVM_PID="$$" export SPARK_USER="cloudera" export SPARK_YARN_CACHE_FILES_TIME_STAMPS="1412266890783" export PWD="/yarn/nm/usercache/cloudera/appcache/application_1412014410679_0011/container_1412014410679_0011_01_000011" export NM_PORT="8041" export LOGNAME="cloudera" export MALLOC_ARENA_MAX="4" export LOG_DIRS="/var/log/hadoop-yarn/container/application_1412014410679_0011/container_1412014410679_0011_01_000011" export SPARK_YARN_CACHE_FILES_FILE_SIZES="93542713" export NM_HTTP_PORT="8042" export LOCAL_DIRS="/yarn/nm/usercache/cloudera/appcache/application_1412014410679_0011" export SPARK_YARN_CACHE_FILES="hdfs://nameservice1/user/cloudera/.sparkStaging/application_1412014410679_0011/spark-assembly.jar#__spark__.jar" export HADOOP_COMMON_HOME="/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop" export HADOOP_TOKEN_FILE_LOCATION="/yarn/nm/usercache/cloudera/appcache/application_1412014410679_0011/container_1412014410679_0011_01_000011/container_tokens" export CLASSPATH="$PWD/__spark__.jar:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$PWD/__app__.jar:$PWD/:$PWD:$PWD/*" export USER="cloudera" export HADOOP_HDFS_HOME="/opt/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/lib/hadoop-hdfs" export CONTAINER_ID="container_1412014410679_0011_01_000011" export HOME="/home/" export HADOOP_CONF_DIR="/var/run/cloudera-scm-agent/process/8994-yarn-NODEMANAGER" ln -sf "/yarn/nm/usercache/cloudera/filecache/20/spark-assembly.jar" "__spark__.jar" exec /bin/bash -c "$JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=log4j-spark-container.properties org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@clouderamain.cluster.local:40136/user/CoarseGrainedScheduler 6 clouderahost1.cluster.local 1 1> /var/log/hadoop-yarn/container/application_1412014410679_0011/container_1412014410679_0011_01_000011/stdout 2> /var/log/hadoop-yarn/container/application_1412014410679_0011/container_1412014410679_0011_01_000011/stderr"
Any help would be appreciated,
Brian
Created 10-02-2014 10:11 AM
Not that it's terribly helpful, but I tried the same on a CDH 5.1.3 cluster and it seemed to work fine.
The error indicates some problem finding Hadoop classes. Do you have any modifications to the Spark config? or anything else changing the installation and so on? are you running the binaries from CDH or your own build?
Created 10-02-2014 10:15 AM
Thanks for the quick reply, Sean.
I do not have any modifications to the Spark configuration and have removed and re-added the Spark service using CM to ensure this. All binaries are directly from CDH.
I have been able to run it successfully on other clusters, as well, it's just this particular cluster that we are having an issue. I don't know that it matters, but here's my cluster layout (this is a lab/test cluster so it's only 3 nodes):
clouderahost1.cluster.local 32 Role(s)
clouderahost2.cluster.local 16 Role(s)
clouderamain.cluster.local 18 Role(s)
Created 10-02-2014 10:50 AM
If you're using yarn-client, you're actually using YARN and not the Spark service that you see in CM. The Spark service is for standalone mode. I don't think that explains it per se, just an FYI. If the symptom is that the Hadoop jars are not being found in the classpath, maybe spot-check the locations it names in the classpath to see that they exist on the cluster? I wonder if one of them has some bad state of parcel directory. It's a long shot but would be my next guess.
Created 10-02-2014 11:07 AM
Yes, I'm aware of the Spark service not actually being utilized with YARN, but CM doesn't allow you to add Gateways without having at least a single Master and single Worker. 😕 I have made sure that the services are not running.
I did some spot checking and saw that everything existed -- nothing really jumps out. The only other thing I can think of is that this cluster (long ago) was installed using packages instead of parcels on CDH 4. Maybe there's something leftover.
Created 10-02-2014 12:56 PM
Investigating the nodes, it seems that there were some /usr/lib/xxx folders left over (e.g. hadoop, hive, etc.). I removed them. I also removed a couple of leftover conf folders. Lastly, I've performed a full cluster restart and a clean_restart of the cloudera-scm-agents. Still seeing the same issue. 😞
Created 10-14-2014 05:02 PM
I got the same error trying to run Spark on YARN. I fixed it by copying /usr/lib/hadoop/client/hadoop-mapreduce-client-core.jar into HDFS, and then putting that file in my /etc/spark/conf/spark-defaults.conf file for the 'spark.yarn.dist.files' directive:
spark.yarn.dist.files /my/path/on/hdfs/hadoop-mapreduce-client-core.jar