Support Questions

Dataminer · ‎03-25-2015

I'm trying to use spark (standalone) to load data onto hive tables. The avro schema is successfully, I see (on spark ui page) that my applications are finished running, however the applications are in the Killed state.

THIS IS THE STDERR.LOG ON THE SPARK WEB UI PAGE VIA CLOUDERA MANAGER:

15/03/25 06:15:58 ERROR Executor: Exception in task 1.3 in stage 2.0 (TID 10)
java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID = 8789839749593513237, local class serialVersionUID = -4145741279224749316
   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
   at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
   at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
15/03/25 06:15:59 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@HadoopNode01.local:48707] -> [akka.tcp://sparkDriver@HadoopNode02.local:54550] disassociated! Shutting down.

Any help will be greatly appreciated.

Thanks

Dataminer · ‎04-01-2015

I checked my network cofig, everything seems to be alright. Every node can communicate with every other node in my cluster. My entire firewall has been disabled, so this may not be a 'port-not-open' issue. I was looking at this other post which I think is discussing the same connectivity error: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Akka-Error-while-running-Spark-Jobs...

Could you pls let me know if there is any spark cofig files or any spark specific setting that I need to look into?

Thank you

Dataminer · ‎04-01-2015

Going back to the earlier versionUID conflict error(java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID = 8789839749593513237, local class serialVersionUID = -4145741279224749316), I've found out that my application used the spark jar file called: spark-core_2.10-1.2.0-cdh5.3.0.jar, this .jar file contains the path org.apache.spark.rdd.PairRDDFunctions as shown in the error, how do i check the serialVersionUID in this jar? and could you pls tell me what other spark jar (from cloudera manager/cdh) could this jar be possibily conflicting with? is it with the spark-assembly.jar?

richie78 · ‎04-08-2015

Hi,

I have exactly the same issue and and trying to do a BDD install too, did you find a solution to the problem?

Dataminer · ‎04-08-2015

Actually yes, i figured out what the problem was. The CDH jars that are shipped with BDD are of version 5.3.0 whereas the cloudera cdh that I have installed on my cluster is of version 5.3.2. Due to this version mismatch i was getting this error. I removed CDH 5.3.2 and replaced it with cloudera parcels of version 5.3.0 (basically a fresh installation of CM and other hadoop components) and then this error doesn't appear.

However, once that error is cleared I'm facing another issue which i have on this post: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-ERROR-CoarseGrainedExecutorBa...

Could you pls let me know(after you implement the version change) if you're getting this same error or if BDD works all the way? Thanks

Bob

richie78 · ‎04-08-2015

Will do for sure, I have had my head burried in log files this afternoon and I think I am going crazy!

Dataminer · ‎04-08-2015

There's very few people out there working on BDD, pls do keep me posted on how things work once you make this fix.

ramsuk · ‎04-22-2015

Hi, I am also facing the similar error/issue. Could you please let me know whether you are able to fix this issue. I am using pseudo cluster with latest cloudera software and Oracle Big Data 1.0

Dataminer · ‎04-22-2015

The CDH jars that are shipped with BDD are of version 5.3.0 whereas the cloudera cdh that I have installed on my cluster is of version 5.3.2. Due to this version mismatch i was getting this error. I removed CDH 5.3.2 and replaced it with cloudera parcels of version 5.3.0 (basically a fresh installation of CM and other hadoop components) and then this error doesn't appear.

ramsuk · ‎04-22-2015

Hi, Dataminer, many thanks for quick post. I have got two exceptions 1. java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID = 8789839749593513237, local class serialVersionUID = -4145741279224749316 and the second one ERROR CoarseGrainedExecutorBackend: Driver Disassociated . Are you able to fix both of these by installing CDH 5.3.0, How to find what version of CDH jars were shipped with BDD as i have downloaded BDD software last week, I am not sure whether still BDD has CDH 5.3.0 jars.

Dataminer · ‎04-22-2015

For me both these issues were resolved once i installed cdh 5.3.0. The cdh version of the bdd jars can be found under the folder ${oracle_home}/Middleware/BDD1.0/dataprocessing/edp_cli/libs

Cloudera Community

Support Questions

Spark (Standalone) error local class incompatible: stream classdesc serialVersionUID