I am getting the org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow when I am execute the collect on 1 GB of RDD(for example : My1GBRDD.collect).
When I am execution the same thing on small Rdd(600MB), It will execute successfully. The problem with above 1GB RDD.
For more details please refer the following steps which I do.
1. Create RDD of input file.
2. mapToPair on the RDD.
3. groupByKey() on the RDD.
4. collectAsMap on the RDD.
On the 4th step I got the SparkException as follows,
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 37
Serialization trace:
otherElements (org.apache.spark.util.collection.CompactBuffer). To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:350)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 37