Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark sql query on hive tables

Spark sql query on hive tables

Hi All,

I am have started to experiment the spark client installed in our system but i am getting the below error while running the spark sql

The current setting for

Spark built for Hadoop

spark.driver.maxResultSize =5g

spark.kryoserializer.buffer - 2m

spark.kryoserializer.buffer.max - 256m


org.apache.spark.SparkException: Job aborted due to stage failure: Task 625 in stage 224854.0 failed 4 times, most recent failure: Lost task 625.3 in stage 224854.0 (TID 14802942,xxxxxxxxx): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1596. To avoid this, increase spark.kryoserializer.buffer.max value.

at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)

at org.apache.spark.executor.Executor$

at java.util.concurrent.ThreadPoolExecutor.runWorker(

at java.util.concurrent.ThreadPoolExecutor$


Driver stacktrace: */


Re: Spark sql query on hive tables

Super Guru

@Jacob Paul

Try to increase the kryoserializer buffer value after you initialized spark context/spark session.

change the property name spark.kryoserializer.buffer.max to spark.kryoserializer.buffer.max.mb

conf.set("spark.kryoserializer.buffer.max.mb", "512")

Refer to this and this link for more details regards to this issue.

Don't have an account?
Coming from Hortonworks? Activate your account here