- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Why is my Spark job stuck?
Created ‎11-29-2017 09:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting desperate here! My Spark2 jobs take hours then get stuck!
I have a 4 node cluster each with 16GB RAM and 8 cores. I run HDP 2.6, Spark 2.1 and Zeppelin 0.7.
I have:
- spark.executor.instances 11
- spark.executor.cores 2
- spark.executor.memory 4G
- yarn.nodemanager.resource.memory-mb=14336
- yarn.nodemanager.resource.cpu-vcores =7
Via Zeppelin (same notebook) I do an INSERT into a Hive table::
- dfPredictions.write.mode(SaveMode.Append).insertInto("default.predictions")
for a 50 column table with about 12 million records.
This gets split into 3 stages of 75, 75 and 200 tasks. The 75 and 75 get stuck at stages 73 and 74 and the garbage collection lasts for hours. Any idea what I can try?
EDIT: I have not looked at tweaking partitions, can anyone give me pointers on how to do that, please?
Created ‎11-30-2017 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check whether SPARK_HOME in interpreter settings points to correct pyspark.
Is it set to below value?
SPARK_HOME | /usr/hdp/current/spark2-client/ |
Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread:
https://issues.apache.org/jira/browse/ZEPPELIN-295
Do spark.driver.memory=4G, spark.driver.cores=2.
Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796
Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.
