I have a query trying to execute in hive context in spark using yarn cluster
Version Spark 1.5.2
Time taking for exact same query
Hive in TEZ : 3 min
Hive context.sql in SPARK: 14 min
Findings:
spark execution is done in 52 stages which completed in ~4min but
Insert overwrite into partition is taking ~8-9 min (where data is copying from hive staging to hdfs)
I have seen problem is raised by many people, but I can't find any answer. This is very critical.
Note: Please suggest the optimal way to execute in less time
Please answer in detail so it will be helpful for others
@Benjamin Leonhardi, @Ravi Mutyala , @gopal