Support Questions

suresh_b_k · ‎08-22-2017

Team,

Getting below error while running spark job

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, rwlp931.rw.discoverfinancial.com): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 37. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

But i dont see the property in my server.

bandarusridhar1 · ‎08-22-2017

@suresh krish

When you see the environmental variables in your spark UI you can see that particular job will be using below property serialization. If you can't see in cluster configuration, that mean user is invoking at the runtime of the job.

<code>spark.serializer        org.apache.spark.serializer.KryoSerializer

Secondly spark.kryoserializer.buffer.max is built inside that with default value 64m. If required you can increase that value at the runtime. Even we can all the KryoSerialization values at the cluster level but that's not good practice without knowing proper use case.

Hope this helps you.