Created on 03-25-2015 01:56 PM - edited 09-16-2022 02:25 AM
I'm trying to use spark (standalone) to load data onto hive tables. The avro schema is successfully, I see (on spark ui page) that my applications are finished running, however the applications are in the Killed state.
THIS IS THE STDERR.LOG ON THE SPARK WEB UI PAGE VIA CLOUDERA MANAGER:
15/03/25 06:15:58 ERROR Executor: Exception in task 1.3 in stage 2.0 (TID 10)
java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID = 8789839749593513237, local class serialVersionUID = -4145741279224749316
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/25 06:15:59 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@HadoopNode01.local:48707] -> [akka.tcp://sparkDriver@HadoopNode02.local:54550] disassociated! Shutting down.
Any help will be greatly appreciated.
Thanks
Created 04-01-2015 07:48 AM
I checked my network cofig, everything seems to be alright. Every node can communicate with every other node in my cluster. My entire firewall has been disabled, so this may not be a 'port-not-open' issue. I was looking at this other post which I think is discussing the same connectivity error: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Akka-Error-while-running-Spark-Jobs...
Could you pls let me know if there is any spark cofig files or any spark specific setting that I need to look into?
Thank you
Created on 04-01-2015 01:19 PM - edited 04-01-2015 01:23 PM
Going back to the earlier versionUID conflict error(java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID = 8789839749593513237, local class serialVersionUID = -4145741279224749316), I've found out that my application used the spark jar file called: spark-core_2.10-1.2.0-cdh5.3.0.jar, this .jar file contains the path org.apache.spark.rdd.PairRDDFunctions as shown in the error, how do i check the serialVersionUID in this jar? and could you pls tell me what other spark jar (from cloudera manager/cdh) could this jar be possibily conflicting with? is it with the spark-assembly.jar?
Created 04-08-2015 05:17 AM
Hi,
I have exactly the same issue and and trying to do a BDD install too, did you find a solution to the problem?
Created 04-08-2015 06:49 AM
Actually yes, i figured out what the problem was. The CDH jars that are shipped with BDD are of version 5.3.0 whereas the cloudera cdh that I have installed on my cluster is of version 5.3.2. Due to this version mismatch i was getting this error. I removed CDH 5.3.2 and replaced it with cloudera parcels of version 5.3.0 (basically a fresh installation of CM and other hadoop components) and then this error doesn't appear.
However, once that error is cleared I'm facing another issue which i have on this post: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-ERROR-CoarseGrainedExecutorBa...
Could you pls let me know(after you implement the version change) if you're getting this same error or if BDD works all the way? Thanks
Bob
Created 04-08-2015 10:04 AM
Will do for sure, I have had my head burried in log files this afternoon and I think I am going crazy!
Created 04-08-2015 10:11 AM
There's very few people out there working on BDD, pls do keep me posted on how things work once you make this fix.
Created 04-22-2015 03:13 PM
Created 04-22-2015 03:15 PM
The CDH jars that are shipped with BDD are of version 5.3.0 whereas the cloudera cdh that I have installed on my cluster is of version 5.3.2. Due to this version mismatch i was getting this error. I removed CDH 5.3.2 and replaced it with cloudera parcels of version 5.3.0 (basically a fresh installation of CM and other hadoop components) and then this error doesn't appear.
Created 04-22-2015 03:29 PM
Created 04-22-2015 03:36 PM
For me both these issues were resolved once i installed cdh 5.3.0. The cdh version of the bdd jars can be found under the folder ${oracle_home}/Middleware/BDD1.0/dataprocessing/edp_cli/libs