- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Kryo serialization failed
- Labels:
-
Apache Spark
Created 08-22-2017 04:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Team,
Getting below error while running spark job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, rwlp931.rw.discoverfinancial.com): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 37. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
But i dont see the property in my server.
Created 08-22-2017 07:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you see the environmental variables in your spark UI you can see that particular job will be using below property serialization. If you can't see in cluster configuration, that mean user is invoking at the runtime of the job.
<code>spark.serializer org.apache.spark.serializer.KryoSerializer
Secondly spark.kryoserializer.buffer.max is built inside that with default value 64m. If required you can increase that value at the runtime. Even we can all the KryoSerialization values at the cluster level but that's not good practice without knowing proper use case.
Hope this helps you.
