spark: Insert overwrite is very very slow in spark


I have a query trying to execute in hive context in spark using yarn cluster

Version Spark 1.5.2

Time taking for exact same query

Hive in TEZ : 3 min

Hive context.sql in SPARK: 14 min


spark execution is done in 52 stages which completed in ~4min but

Insert overwrite into partition is taking ~8-9 min (where data is copying from hive staging to hdfs)

I have seen problem is raised by many people, but I can't find any answer. This is very critical.

Note: Please suggest the optimal way to execute in less time

Please answer in detail so it will be helpful for others

Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?