Reply
Explorer
Posts: 25
Registered: ‎08-26-2016

Where does spark-submit look for Jar files?

Hello,

 

I am running a sort job through spark-submit. I've also written a custom compression codec (instead of the standard lzf or snappy). The custom codec is packaged in a jar file. My goal is to have spark use my custom codec for compression.

 

For the first couple of compression operations, my codec is indeed used (I see debug messages). Later, there is an exception thrown (java.lang.NoSuchMethodException, stating that my Java class isn't found).

 

I'm using YARN. My jar file is on the master node. It's path is specified in the /etc/spark/conf/classpath.txt.

 

Any ideas on why it's not being found (and only sometimes)? Perhaps I ought to specify the jar file's location in some other way?

 

Your suggestions please.

 

Thanks.

 

 

Cloudera Employee
Posts: 60
Registered: ‎03-01-2016

Re: Where does spark-submit look for Jar files?

The proper way to add custom jar to classpath is using the "--jar" option with spark-submit:

 

 --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths.

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

Thank you for your reply.

 

I tried out your suggestion.

spark-submit now includes a --jars line, specifying the local path of the custom jar file on the master node.

 

With this change, the job continues to behave as earlier. It successfully finds the class in the custom jar file for the first couple of invocations, and later throws a java.lang.NoSuchMethodException.

 

It feels like there is something simple I'm missing?

 

Thanks.

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Where does spark-submit look for Jar files?

--jars isn't quite relevant here, as it will just put classes in the same classloader as if you'd packaged the classes with your app. The key is that there are many classloaders at play here and only some can see user classes. IIRC codec classes are the most problematic because they're necessary within Spark. You should post more details about the failure. Also try the "user classpath first" options.

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

Thanks for your message. It definitely seems like different things are at play here.

 

Here's the stack trace of the failure.

 

User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, nje-amazon4): java.io.IOException: java.lang.NoSuchMethodException: com.test.new_codec.<init>(org.apache.spark.SparkConf)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1178)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodException: com.test.new_codec.<init>(org.apache.spark.SparkConf)
at java.lang.Class.getConstructor0(Class.java:2849)
at java.lang.Class.getConstructor(Class.java:1718)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:66)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:167)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1175)
... 12 more

 

 

Thanks.

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

The corresponding driver stack trace is:

 

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.collect(RDD.scala:904)
at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:264)
at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:126)
at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
at com.github.ehiggs.spark.terasort.TeraSort$.main(TeraSort.scala:79)
at com.github.ehiggs.spark.terasort.TeraSort.main(TeraSort.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

I must add that the job runs through successfully to completion, when I run in standalone mode. i.e.

spark.master=local[*]

 

 

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Where does spark-submit look for Jar files?

That one's actually easy. As it says your codec doesn't have a constructor accepting a SparkConf.

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

Actually, it does indeed have such a constructor. What's strange is that it isn't visible at this point.

As I mentioned earlier, the codec is successfully instantiated the first couple of times (with that same constructor, as there's only one in that class).

 

Thanks again for your quick responses.

 

Explorer
Posts: 25
Registered: ‎08-26-2016

Re: Where does spark-submit look for Jar files?

I was able to fix this, by changing the name of the Java class to something else. i.e. once I renamed the Java class, and recreated the Jar file, things worked smoothly.

 

If I recreate the Jar file with the original named Java class, the failure is again seen.

 

I'm not yet sure how to explain this. Perhaps there is an older version of my Jar file that is conflicting with this name. I couldn't find one in any of the folders that I looked in. 

 

Although I have a working solution, I'm still curious about the reason for the failure. Your tips welcome.

 

Thanks.

Announcements