Support Questions

ed_day · ‎11-29-2017

I am getting desperate here! My Spark2 jobs take hours then get stuck!

I have a 4 node cluster each with 16GB RAM and 8 cores. I run HDP 2.6, Spark 2.1 and Zeppelin 0.7.

I have:

spark.executor.instances 11
spark.executor.cores 2
spark.executor.memory 4G
yarn.nodemanager.resource.memory-mb=14336
yarn.nodemanager.resource.cpu-vcores =7

Via Zeppelin (same notebook) I do an INSERT into a Hive table::

dfPredictions.write.mode(SaveMode.Append).insertInto("default.predictions")

for a 50 column table with about 12 million records.

This gets split into 3 stages of 75, 75 and 200 tasks. The 75 and 75 get stuck at stages 73 and 74 and the garbage collection lasts for hours. Any idea what I can try?

EDIT: I have not looked at tweaking partitions, can anyone give me pointers on how to do that, please?

tsharma · ‎11-30-2017

Check whether SPARK_HOME in interpreter settings points to correct pyspark.

Is it set to below value?

SPARK_HOME

/usr/hdp/current/spark2-client/

Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread:

https://issues.apache.org/jira/browse/ZEPPELIN-295

Do spark.driver.memory=4G, spark.driver.cores=2.

Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796

Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host.

Cloudera Community

Support Questions

Why is my Spark job stuck?