Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark jobs fail with more memory, work with default memory. Why?

Highlighted

Spark jobs fail with more memory, work with default memory. Why?

Explorer

We have a Spark 2.2 job written in Scala and running in a YARN cluster that does the following:

  1. Read several thousand small compressed parquet files (~15kb each) into two dataframes
  2. Join the dataframes on one column
  3. Foldleft over all columns to clean some data
  4. Drop duplicates
  5. Write result dataframe to parquet

The following configuration fails via java.lang.OutOfMemory java heap space:

  • --conf spark.yarn.am.memory=4g
  • --conf spark.executor.memory=20g
  • --conf spark.yarn.executor.memoryOverhead=1g
  • --conf spark.dynamicAllocation.enabled=true
  • --conf spark.shuffle.service.enabled=true
  • --conf spark.dynamicAllocation.maxExecutors=5
  • --conf spark.executor.cores=4
  • --conf spark.network.timeout=2000

However, this job works reliably if we remove spark.executor.memory entirely. This gives each executor 1g of ram.

This job also fails if we do any of the following:

  • Increase executors
  • Increase default parallelism or spark.sql.shuffle.partitions

Can anyone help me understand why more memory and more executors leads to failed jobs due to OutOfMemory?

Don't have an account?
Coming from Hortonworks? Activate your account here