Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark sql query on hive tables

avatar

Hi All,

I am have started to experiment the spark client installed in our system but i am getting the below error while running the spark sql

The current setting for

Spark 1.6.1.2.4.2.0-258 built for Hadoop 2.7.1.2.4.2.0-258

spark.driver.maxResultSize =5g

spark.kryoserializer.buffer - 2m

spark.kryoserializer.buffer.max - 256m




/*

org.apache.spark.SparkException: Job aborted due to stage failure: Task 625 in stage 224854.0 failed 4 times, most recent failure: Lost task 625.3 in stage 224854.0 (TID 14802942,xxxxxxxxx): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1596. To avoid this, increase spark.kryoserializer.buffer.max value.

at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Driver stacktrace: */

1 REPLY 1

avatar
Master Guru

@Jacob Paul

Try to increase the kryoserializer buffer value after you initialized spark context/spark session.

change the property name spark.kryoserializer.buffer.max to spark.kryoserializer.buffer.max.mb

conf.set("spark.kryoserializer.buffer.max.mb", "512")

Refer to this and this link for more details regards to this issue.