Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Why is my Spark job stuck?

avatar
Expert Contributor

I am getting desperate here! My Spark2 jobs take hours then get stuck!

I have a 4 node cluster each with 16GB RAM and 8 cores. I run HDP 2.6, Spark 2.1 and Zeppelin 0.7.

I have:

  1. spark.executor.instances 11
  2. spark.executor.cores 2
  3. spark.executor.memory 4G
  4. yarn.nodemanager.resource.memory-mb=14336
  5. yarn.nodemanager.resource.cpu-vcores =7

Via Zeppelin (same notebook) I do an INSERT into a Hive table::

  1. dfPredictions.write.mode(SaveMode.Append).insertInto("default.predictions")

for a 50 column table with about 12 million records.

This gets split into 3 stages of 75, 75 and 200 tasks. The 75 and 75 get stuck at stages 73 and 74 and the garbage collection lasts for hours. Any idea what I can try?

EDIT: I have not looked at tweaking partitions, can anyone give me pointers on how to do that, please?

1 REPLY 1

avatar
Expert Contributor

Check whether SPARK_HOME in interpreter settings points to correct pyspark.

Is it set to below value?

SPARK_HOME/usr/hdp/current/spark2-client/

Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread:

https://issues.apache.org/jira/browse/ZEPPELIN-295

Do spark.driver.memory=4G, spark.driver.cores=2.

Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796

Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.

Labels