Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Facing memory issues while running Spark Job

Highlighted

Facing memory issues while running Spark Job

Explorer

I am facing some issues while running Spark(1.6) jobs in Yarn cluster mode with below configurations: --master yarn --deploy-mode cluster --executor-cores 8 --num-executors 3 --executor-memory 25G --driver-memory 6g --conf spark.network.timeout=10000000 --conf spark.cores.max=35 --conf spark.memory.fraction=0.6 --conf spark.memory.storageFraction=0.5 --conf spark.shuffle.memoryFraction=1

Also, I am giving spark.sql.shuffle.partitions=30 in spark config.xml.

I am running the job with above command on a three node cluster setup of hortonworks where each node has around 51GB of memory available. The input data records is approx 254 million. The job crashes when inserting data to Hive with Executor Lost issue and Exit code as 143. There is very high shuffling of data during processing.

Can you please suggest what can be done to resolve this issue?

Also, how can we determine based on input size , the memory parameters to be used for running the job?

1 REPLY 1
Highlighted

Re: Facing memory issues while running Spark Job

Expert Contributor

@Neha Jainsince you mentioned there is much shuffling of data, try to increase the driver memory and try to run.

Don't have an account?
Coming from Hortonworks? Activate your account here