Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Where can I find distributed jar file through spark-submit.

Where can I find distributed jar file through spark-submit.

New Contributor

I have submitted a spark application jar that have been tested on most other spark clusters (standalone, EMR, MapR). But it fails when being submitted to cloudera YARN. The following stack trace was thrown:

 

```

16/08/11 23:39:24 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, ip-172-31-23-253.us-west-1.compute.internal): java.lang.UnsupportedClassVersionError: com/schedule1/datapassport/http/ResilientRedirectStrategy : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.schedule1.datapassport.remote.HttpClient.<init>(HttpClient.scala:69)
at com.schedule1.datapassport.Const$.<init>(Const.scala:31)
at com.schedule1.datapassport.Const$.<clinit>(Const.scala)
at com.schedule1.datapassport.spark.command.VersionRow$.apply$default$2(VersionRow.scala:12)
at com.schedule1.datapassport.spark.sql.SecureSQLContextMixin$$anonfun$7.apply(SecureSQLContextMixin.scala:159)
at com.schedule1.datapassport.spark.sql.SecureSQLContextMixin$$anonfun$7.apply(SecureSQLContextMixin.scala:158)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/08/11 23:39:24 WARN scheduler.TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, ip-172-31-23-253.us-west-1.compute.internal): java.lang.NoClassDefFoundError: Could not initialize class com.schedule1.datapassport.Const$
at com.schedule1.datapassport.spark.command.VersionRow$.apply$default$2(VersionRow.scala:12)
at com.schedule1.datapassport.spark.sql.SecureSQLContextMixin$$anonfun$7.apply(SecureSQLContextMixin.scala:159)
at com.schedule1.datapassport.spark.sql.SecureSQLContextMixin$$anonfun$7.apply(SecureSQLContextMixin.scala:158)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/08/11 23:39:24 INFO scheduler.TaskSetManager: Starting task 3.1 in stage 0.0 (TID 4, ip-172-31-23-253.us-west-1.compute.internal, partition 3,PROCESS_LOCAL, 2205 bytes)
16/08/11 23:39:24 INFO scheduler.TaskSetManager: Starting task 2.1 in stage 0.0 (TID 5, ip-172-31-23-253.us-west-1.compute.internal, partition 2,PROCESS_LOCAL, 2238 bytes)
16/08/11 23:39:24 WARN server.TransportChannelHandler: Exception in connection from ip-172-31-23-253.us-west-1.compute.internal/172.31.23.253:59307
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
16/08/11 23:39:24 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1.
16/08/11 23:39:24 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0)
16/08/11 23:39:24 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/08/11 23:39:24 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-172-31-23-253.us-west-1.compute.internal, 34787)
16/08/11 23:39:24 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
16/08/11 23:39:24 ERROR cluster.YarnScheduler: Lost executor 1 on ip-172-31-23-253.us-west-1.compute.internal: Container marked as failed: container_1470956114451_0003_01_000002 on host: ip-172-31-23-253.us-west-1.compute.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1470956114451_0003_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:578)
at org.apache.hadoop.util.Shell.run(Shell.java:481)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

16/08/11 23:39:24 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1470956114451_0003_01_000002 on host: ip-172-31-23-253.us-west-1.compute.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1470956114451_0003_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:578)
at org.apache.hadoop.util.Shell.run(Shell.java:481)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

16/08/11 23:39:24 WARN scheduler.TaskSetManager: Lost task 2.1 in stage 0.0 (TID 5, ip-172-31-23-253.us-west-1.compute.internal): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1470956114451_0003_01_000002 on host: ip-172-31-23-253.us-west-1.compute.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1470956114451_0003_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:578)
at org.apache.hadoop.util.Shell.run(Shell.java:481)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

16/08/11 23:39:24 WARN scheduler.TaskSetManager: Lost task 3.1 in stage 0.0 (TID 4, ip-172-31-23-253.us-west-1.compute.internal): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1470956114451_0003_01_000002 on host: ip-172-31-23-253.us-west-1.compute.internal. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_1470956114451_0003_01_000002
Exit code: 50
Stack trace: ExitCodeException exitCode=50:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:578)
at org.apache.hadoop.util.Shell.run(Shell.java:481)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:763)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 50

```

 

Looks like YARN fail to deploy my latest version of jar onto all executors, results in class `com.schedule1.datapassport.Const` being missing on executor. I would like to manually verify the jar has been deployed and running on all executors, and further delete any cached jar that is tampering my execution. But I didn't find any jar being written under $SPARK_HOME/work/, so where is the directory of the distributed jar file?

1 REPLY 1
Highlighted

Re: Where can I find distributed jar file through spark-submit.

New Contributor

BTW, this problem is unlikely caused by version inconsistency on all executors, spark-shell on YARN runs smoothly, collecting spark versions on all executors with the following code in spark-shell:

 

sc.parallelize(1 to (sc.defaultParallelism * 10)
).mapPartitions(
itr =>
Iterator(org.apache.spark.SPARK_VERSION)
).map(_.toString).collect().foreach(println)

 

print out the right version:

 

16/08/11 23:48:57 INFO spark.SparkContext: Starting job: collect at <console>:32
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:32) with 2 output partitions
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (collect at <console>:32)
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Missing parents: List()
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at <console>:32), which has no missing parents
16/08/11 23:48:57 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.3 KB, free 2.3 KB)
16/08/11 23:48:57 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1379.0 B, free 3.6 KB)
16/08/11 23:48:57 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.23.254:59814 (size: 1379.0 B, free: 511.5 MB)
16/08/11 23:48:57 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/08/11 23:48:57 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at <console>:32)
16/08/11 23:48:57 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
16/08/11 23:48:58 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
16/08/11 23:48:59 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
1.6.2

1.6.2