Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Exception in spark: Task not serializable

avatar
New Contributor

Here is my code:

 

 Code.jpg

 

I have similar code in both the blocks i highlighted.

There is 'map' function at line 58 and the corresponding 'aggregate' function at line 64.

I dont have any issue with these.

When the 'map function at line 75 is executed, i get the 'Task not serializable' exception as below.

 

Can i get some help here?

 

I get the following exception: 

2018-11-29 04:01:13.098 00000123 FATAL: org.apache.spark.SparkException: Task not serializable
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
org.apache.spark.rdd.RDD.map(RDD.scala:369)
com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:75)
com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:101)
com.ibm.infosphere.ia.odf.services.DQALivySparkJob.processDataFrame(DQALivySparkJob.java:352)
com.ibm.iis.odf.api.spark.DataframeSparkService.runAnalysis(DataframeSparkService.java:216)
com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:50)
com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:25)
com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:40)
com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
com.cloudera.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)

 

 

1 REPLY 1

avatar
New Contributor

I figured out the problem here.

 

Most of the Object i used in my project is complex. Ofcourse all of them implements 'Serializable' interface.

In this case, the Spark engine is supposed to use 'JavaSerializer'(org.apache.spark.serializer.JavaSerializer), but by default it is using 'KryoSerializer'(org.apache.spark.serializer.KryoSerializer).

 

Solution:

1. Go to Cloudera manager
2. Select 'spark2' service
3. Go to configuration tab
4. search for 'spark.serializer' property and change it to 'org.apache.spark.serializer.JavaSerializer', where as the default serializer is 'org.apache.spark.serializer.KryoSerializer'

5. Dont need to restart Spark2, just make sure 'client configuration' is deployed.
6. Restart livy. (optional, but better do that to close current applications)