Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

spark: Insert overwrite is very very slow in spark


I have a query trying to execute in hive context in spark using yarn cluster

Version Spark 1.5.2

Time taking for exact same query

Hive in TEZ : 3 min

Hive context.sql in SPARK: 14 min


spark execution is done in 52 stages which completed in ~4min but

Insert overwrite into partition is taking ~8-9 min (where data is copying from hive staging to hdfs)

I have seen problem is raised by many people, but I can't find any answer. This is very critical.

Note: Please suggest the optimal way to execute in less time

Please answer in detail so it will be helpful for others

@Benjamin Leonhardi, @Ravi Mutyala , @gopal



Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?