Support Questions

Find answers, ask questions, and share your expertise

Run Spark App Error

avatar
New Contributor

Case 1: when use java command line to run:

#$JAVA_HOME/bin/java -cp $CLASSPATH -Dspark.master=spark://10.xxx.xxx.xxx:43191 com.cloudera.sparkwordcount.SparkWordCount hdfs://xxxxxx.com:8020/user/hdfs/spark/LICENSE 2

I got java.lang.NoSuchMethodError: com.google.common.HashFunction.hasInt(I)Lcom...

14/08/29 18:37:16 INFO spark.SecurityManager: Changing view acls to: root
14/08/29 18:37:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root)
14/08/29 18:37:17 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/08/29 18:37:17 INFO Remoting: Starting remoting
14/08/29 18:37:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@Exxxxy-head.amers1.ciscloud:52049]
14/08/29 18:37:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@Exxxxxx.ciscloud:52049]
14/08/29 18:37:17 INFO spark.SparkEnv: Registering MapOutputTracker
14/08/29 18:37:17 INFO spark.SparkEnv: Registering BlockManagerMaster
14/08/29 18:37:17 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140829183717-3369
14/08/29 18:37:17 INFO storage.MemoryStore: MemoryStore started with capacity 2.0 GB.
14/08/29 18:37:17 INFO network.ConnectionManager: Bound socket to port 45604 with id = ConnectionManagerId(xxxxx,45604)
14/08/29 18:37:17 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/08/29 18:37:17 INFO storage.BlockManagerInfo: Registering block manager ETSInterDay-head.amers1.ciscloud:45604 with 2.0 GB RAM
14/08/29 18:37:17 INFO storage.BlockManagerMaster: Registered BlockManager
14/08/29 18:37:17 INFO spark.HttpServer: Starting HTTP Server
14/08/29 18:37:17 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/08/29 18:37:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:54315
14/08/29 18:37:17 INFO broadcast.HttpBroadcast: Broadcast server started at http://xxxxx:54315
14/08/29 18:37:17 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-19eccd14-bc32-4112-9e97-2197e059456b
14/08/29 18:37:17 INFO spark.HttpServer: Starting HTTP Server
14/08/29 18:37:17 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/08/29 18:37:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50747
14/08/29 18:37:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/08/29 18:37:18 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/08/29 18:37:18 INFO ui.SparkUI: Started SparkUI at http://xxxd:4040
14/08/29 18:37:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/29 18:37:18 INFO client.AppClient$ClientActor: Connecting to master spark://1xxxx...
14/08/29 18:37:18 WARN storage.BlockManager: Putting block broadcast_0 failed
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
at org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52)
at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:545)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:457)
at com.cloudera.sparkwordcount.SparkWordCount$.main(SparkWordCount.scala:17)
at com.cloudera.sparkwordcount.SparkWordCount.main(SparkWordCount.scala)

Case 2
======
When use:
./spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master spark://xx.xxx.xxx.xxx:43191 /hadoop/cloudera/parcels/CDH/lib/spark/m/wordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar hdfs://xxxx.xxx.xxx:8020//user//hdfs//spark//LICENSE 2

I got:

14/08/29 18:41:59 INFO client.AppClient$ClientActor: Executor updated: app-20140829184159-0005/0 is now RUNNING
14/08/29 18:41:59 INFO client.AppClient$ClientActor: Executor updated: app-20140829184159-0005/1 is now RUNNING
14/08/29 18:41:59 INFO mapred.FileInputFormat: Total input paths to process : 1
14/08/29 18:42:00 INFO spark.SparkContext: Starting job: collect at SparkWordCount.scala:28
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at SparkWordCount.scala:20)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Registering RDD 10 (reduceByKey at SparkWordCount.scala:26)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Got job 0 (collect at SparkWordCount.scala:28) with 2 output partitions (allowLocal=false)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at SparkWordCount.scala:28)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[4] at reduceByKey at SparkWordCount.scala:20), which has no missing parents
14/08/29 18:42:00 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2 (MapPartitionsRDD[4] at reduceByKey at SparkWordCount.scala:20)
14/08/29 18:42:00 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
14/08/29 18:42:01 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@xxx:44712/user/Executor#-1200084333] with ID 1
14/08/29 18:42:01 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 0 on executor 1: ETSInterDay-worker1.amers1.ciscloud (PROCESS_LOCAL)
14/08/29 18:42:01 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 2192 bytes in 2 ms
14/08/29 18:42:01 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 1 on executor 1: ETSInterDay-worker1.amers1.ciscloud (PROCESS_LOCAL)
14/08/29 18:42:01 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 2192 bytes in 0 ms
14/08/29 18:42:01 INFO storage.BlockManagerInfo: Registering block manager ETSInterDay-worker1.amers1.ciscloud:41977 with 294.9 MB RAM
14/08/29 18:42:01 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@xxxxx:49084/user/Executor#593030937] with ID 0
14/08/29 18:42:02 INFO storage.BlockManagerInfo: Registering block managerxxxxxx:55303 with 294.9 MB RAM
14/08/29 18:42:02 WARN scheduler.TaskSetManager: Lost TID 0 (task 2.0:0)
14/08/29 18:42:02 WARN scheduler.TaskSetManager: Loss was due to java.lang.RuntimeException
java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:657)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:189)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Please help!

1 ACCEPTED SOLUTION

avatar
New Contributor

Solved this by having following property defined in workflow.xml.

 

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.user.classpath.first</name>
    <value>true</value>
  </property>
 
.....
</configuration>
 
 

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

This is a conflict between the version of Guava that Spark uses, and the version used by Hadoop. How are you packaging your app? and can you run with spark-submit? this tends to take care of this conflict.

avatar
New Contributor

I am getting the same error when launching Spark job through Oozie using Java action. Any update on how to resolve this?

 

Thanks.

avatar
New Contributor

Solved this by having following property defined in workflow.xml.

 

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.user.classpath.first</name>
    <value>true</value>
  </property>
 
.....
</configuration>