11-06-2018 05:46 AM - edited 11-06-2018 05:51 AM
I am writing a Spark Streaming job to read messages from Kafka. The default serializer used is KryoSerializer. When I run the job, I am encountering the below exception
18/10/31 16:54:02 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 6, *****, executor 4): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 297. To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:318)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:386)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 297
at com.esotericsoftware.kryo.io.Output.require(Output.java:163)
at com.esotericsoftware.kryo.io.Output.writeString_slow(Output.java:462)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:363)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:191)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:184)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
... 4 more
Even after specifying the Kryo Buffer and Kryo Max buffer size, I am encountering this exception. I am facing this exception only in cloudera spark-2.2.0 distribution, but not in Venilla Spark 2.2.0.
The configurations are given below.
(spark.driver.userClassPathFirst,true)
(spark.executor.extraLibraryPath,/var/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/lib/native)
(spark.driver.memory,4g)
(spark.authenticate,false)
(spark.yarn.jars,local:/var/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/jars/*)
(spark.driver.extraLibraryPath,/var/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/lib/native)
(spark.yarn.historyServer.address,http://****18089)
(spark.yarn.am.extraLibraryPath,/var/opt/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/lib/native)
(spark.eventLog.enabled,true)
(spark.dynamicAllocation.schedulerBacklogTimeout,1)
(spark.yarn.config.gatewayPath,/var/opt/cloudera/parcels)
(spark.ui.killEnabled,true)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-Djava.security.auth.login.config=/tmp/jaas.conf)
(spark.dynamicAllocation.executorIdleTimeout,60)
(spark.dynamicAllocation.minExecutors,0)
(spark.hadoop.yarn.application.classpath,)
(spark.shuffle.service.enabled,true)
(spark.yarn.config.replacementPath,{{HADOOP_COMMON_HOME}}/../../..)
(spark.ui.enabled,true)
(spark.io.encryption.enabled,false)
(spark.driver.extraJavaOptions,-Djava.security.auth.login.config=/tmp/jaas.conf)
(spark.sql.hive.metastore.version,1.1.0)
(spark.kryoserializer.buffer.max,512m)
(spark.submit.deployMode,client)
(spark.shuffle.service.port,7337)
(spark.network.crypto.enabled,false)
(spark.hadoop.mapreduce.application.classpath,)
(spark.eventLog.dir,hdfs:///user/spark/spark2ApplicationHistory)
(spark.master,yarn)
(spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON,/var/opt/cloudera/parcels/Anaconda-4.3.1/bin/python)
(spark.dynamicAllocation.enabled,true)
(spark.sql.catalogImplementation,hive)
(spark.yarn.appMasterEnv.PYSPARK_PYTHON,/var/opt/cloudera/parcels/Anaconda-4.3.1/bin/python)
(spark.sql.hive.metastore.jars,${env:HADOOP_COMMON_HOME}/../hive/lib/*:${env:HADOOP_COMMON_HOME}/client/*)