Member since
11-29-2018
2
Posts
0
Kudos Received
0
Solutions
11-29-2018
07:31 AM
I figured out the problem here. Most of the Object i used in my project is complex. Ofcourse all of them implements 'Serializable' interface. In this case, the Spark engine is supposed to use 'JavaSerializer'(org.apache.spark.serializer.JavaSerializer), but by default it is using 'KryoSerializer'(org.apache.spark.serializer.KryoSerializer). Solution: 1. Go to Cloudera manager 2. Select 'spark2' service 3. Go to configuration tab 4. search for 'spark.serializer' property and change it to 'org.apache.spark.serializer.JavaSerializer', where as the default serializer is 'org.apache.spark.serializer.KryoSerializer' 5. Dont need to restart Spark2, just make sure 'client configuration' is deployed. 6. Restart livy. (optional, but better do that to close current applications)
... View more
11-29-2018
04:21 AM
Here is my code: I have similar code in both the blocks i highlighted. There is 'map' function at line 58 and the corresponding 'aggregate' function at line 64. I dont have any issue with these. When the 'map function at line 75 is executed, i get the 'Task not serializable' exception as below. Can i get some help here? I get the following exception: 2018-11-29 04:01:13.098 00000123 FATAL: org.apache.spark.SparkException: Task not serializable org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) org.apache.spark.SparkContext.clean(SparkContext.scala:2094) org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370) org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) org.apache.spark.rdd.RDD.withScope(RDD.scala:362) org.apache.spark.rdd.RDD.map(RDD.scala:369) com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:75) com.ibm.infosphere.dataquality.spark.SparkDataQualityAnalyzer.analyze(SparkDataQualityAnalyzer.scala:101) com.ibm.infosphere.ia.odf.services.DQALivySparkJob.processDataFrame(DQALivySparkJob.java:352) com.ibm.iis.odf.api.spark.DataframeSparkService.runAnalysis(DataframeSparkService.java:216) com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:50) com.ibm.iis.odf.api.spark.ODFClouderaLivyJob.call(ODFClouderaLivyJob.java:25) com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:40) com.cloudera.livy.rsc.driver.BypassJob.call(BypassJob.java:27) com.cloudera.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)
... View more
Labels:
- Labels:
-
Apache Spark