Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error while executing Spark 2.0 program written in Scala

Highlighted

Error while executing Spark 2.0 program written in Scala

New Contributor

Hi,

I am trying to execute a word count program on Spark 2.0. Instead of Using Data frames, I am doing this using RDDs.

 

Spark Version - 2.0.2
Scala Version - 2.11.8
Java Version - 1.8.0_144
import org.apache.spark.sql.SparkSession
 
package word_count {
object HelloWorld {
  def main(args: Array[String]): Unit = {
  // A SparkSession can be created using a builder pattern 
  val spark = SparkSession.builder.master("local")
                                  .appName("my-spark-app")
                                  .getOrCreate()
  
  // Access to the underlying SparkContext
  val sc = spark.sparkContext
 
  // org.apache.spark.rdd.RDD[String]
  val words = sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/breast-cancer-wisconsin.data")
 
  // org.apache.spark.rdd.RDD[String]
  val wordsFlatMap = words.flatMap(_.split(","))
 
  // For each word, Create tuple. Key is going to be the word itself. 
  // and value will be '1' 
  // Type - org.apache.spark.rdd.RDD[(String, Int)]
  val wordsMap = wordsFlatMap.map(w => (w,1))
 
  // Sum of all occurrences of a word using 'reduceByKey'
  val wordCount = wordsMap.reduceByKey((a,b) => (a+b))
 
  val wordCountSorted = wordCount.sortByKey(true)
  wordCountSorted.collect.foreach(println)
 }
}
}

I compiled the program using scalac

$ scalac -classpath "/home/cloudera/spark_progs/spark-sql_2.11-2.0.2.jar:/home/cloudera/spark_progs/spark-core_2.11-2.0.2.jar" -sourcepath src -d bin src/word_count/HelloWorld.scala

then, created a jar file 'hellow.jar'

Executed the script using this:

$ spark-submit --verbose --class word_count.HelloWorld --master local /home/cloudera/spark_progs/hellow.jar

Getting this error

17/10/03 10:18:58 INFO HadoopRDD: Input split: hdfs://quickstart.cloudera:8020/user/cloudera/breast-cancer-wisconsin.data:0+19889
17/10/03 10:18:58 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/10/03 10:18:58 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/10/03 10:18:58 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/10/03 10:18:58 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/10/03 10:18:58 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/10/03 10:18:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
       
 at word_count.HelloWorld$.$anonfun$main$1(HelloWorld.scala:28)
        at word_count.HelloWorld$.$anonfun$main$1$adapted(HelloWorld.scala:28)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
17/10/03 10:18:58 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
        at word_count.HelloWorld$.$anonfun$main$1(HelloWorld.scala:28)
        at word_count.HelloWorld$.$anonfun$main$1$adapted(HelloWorld.scala:28)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
 
 
17/10/03 10:18:58 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
17/10/03 10:18:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/03 10:18:58 INFO TaskSchedulerImpl: Cancelling stage 0
17/10/03 10:18:58 INFO DAGScheduler: ShuffleMapStage 0 (map at HelloWorld.scala:33) failed in 0.662 s
17/10/03 10:18:58 INFO DAGScheduler: Job 0 failed: collect at HelloWorld.scala:40, took 1.079757 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
        at word_count.HelloWorld$.$anonfun$main$1(HelloWorld.scala:28)
        at word_count.HelloWorld$.$anonfun$main$1$adapted(HelloWorld.scala:28)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
 
 
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1873)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1886)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1899)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1913)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:912)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:911)
        at word_count.HelloWorld$.main(HelloWorld.scala:40)
        at word_count.HelloWorld.main(HelloWorld.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
        at word_count.HelloWorld$.$anonfun$main$1(HelloWorld.scala:28)
        at word_count.HelloWorld$.$anonfun$main$1$adapted(HelloWorld.scala:28)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
17/10/03 10:18:58 INFO SparkContext: Invoking stop() from shutdown hook
17/10/03 10:18:58 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
17/10/03 10:18:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/10/03 10:18:58 INFO MemoryStore: MemoryStore cleared
17/10/03 10:18:58 INFO BlockManager: BlockManager stopped
17/10/03 10:18:58 INFO BlockManagerMaster: BlockManagerMaster stopped
17/10/03 10:18:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/10/03 10:18:58 INFO SparkContext: Successfully stopped SparkContext
17/10/03 10:18:58 INFO ShutdownHookManager: Shutdown hook called
17/10/03 10:18:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-122fe514-29a5-4387-8990-ec6e83f56e64

I am sure there are better ways to build and execute it using sbt or Maven. but, I tried to run using basic command line tools.

 

The program is failing after the HDFS file has been read using "sc.textFile".

The same commands are working with 'Spark-shell'.

 

Can you help me in resolving this ?

Thanks