- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
spark: Insert overwrite is very very slow in spark
- Labels:
-
Apache Hive
-
Apache Spark
Created ‎03-31-2017 05:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a query trying to execute in hive context in spark using yarn cluster
Version Spark 1.5.2
Time taking for exact same query
Hive in TEZ : 3 min
Hive context.sql in SPARK: 14 min
Findings:
spark execution is done in 52 stages which completed in ~4min but
Insert overwrite into partition is taking ~8-9 min (where data is copying from hive staging to hdfs)
I have seen problem is raised by many people, but I can't find any answer. This is very critical.
Note: Please suggest the optimal way to execute in less time
Please answer in detail so it will be helpful for others
Created ‎04-25-2017 04:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if this is the problem, but how many executors are working on the insert (when viewing your job via the Spark UI)? Are you setting executor-cores?
