- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Exception in spark: Task not serializable
- Labels:
-
Apache Spark
Created on ‎11-29-2018 04:21 AM - edited ‎09-16-2022 06:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is my code:
I have similar code in both the blocks i highlighted.
There is 'map' function at line 58 and the corresponding 'aggregate' function at line 64.
I dont have any issue with these.
When the 'map function at line 75 is executed, i get the 'Task not serializable' exception as below.
Can i get some help here?
I get the following exception:
2018-11-29 04:01:13.098 00000123 FATAL: org.apache.spark.SparkException: Task not serializable
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
org.apache.spark.rdd.RDD.map(RDD.scala:369)
com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:75)
com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:101)
com.ibm.infosphere.ia.odf.services.DQALivySparkJob.processDataFrame(DQALivySparkJob.java:352)
com.ibm.iis.odf.api.spark.DataframeSparkService.runAnalysis(DataframeSparkService.java:216)
com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:50)
com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:25)
com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:40)
com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
com.cloudera.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)
Created ‎11-29-2018 07:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured out the problem here.
Most of the Object i used in my project is complex. Ofcourse all of them implements 'Serializable' interface.
In this case, the Spark engine is supposed to use 'JavaSerializer'(org.apache.spark.serializer.JavaSerializer), but by default it is using 'KryoSerializer'(org.apache.spark.serializer.KryoSerializer).
Solution:
1. Go to Cloudera manager
2. Select 'spark2' service
3. Go to configuration tab
4. search for 'spark.serializer' property and change it to 'org.apache.spark.serializer.JavaSerializer', where as the default serializer is 'org.apache.spark.serializer.KryoSerializer'
5. Dont need to restart Spark2, just make sure 'client configuration' is deployed.
6. Restart livy. (optional, but better do that to close current applications)
