Member since
08-20-2019
2
Posts
0
Kudos Received
0
Solutions
08-21-2019
12:12 AM
I am getting the org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow when I am execute the collect on 1 GB of RDD(for example : My1GBRDD.collect).
When I am execution the same thing on small Rdd(600MB), It will execute successfully. The problem with above 1GB RDD.
For more details please refer the following steps which I do.
1. Create RDD of input file.
2. mapToPair on the RDD.
3. groupByKey() on the RDD.
4. collectAsMap on the RDD.
On the 4th step I got the SparkException as follows,
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 37 Serialization trace: otherElements (org.apache.spark.util.collection.CompactBuffer). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:350) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 37
... View more
Labels:
- Labels:
-
Apache Spark