Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn Container Sizing Mapreduce/Spark

Yarn Container Sizing Mapreduce/Spark

New Contributor

Team - I am still trying to grasp the idea of performance tuning  YARN in CDH. I have a few doubts that I want to clear out before I can actually start to look into details more. Please keep in mind, I have very  minimal knowledge of java and I apologize if Im asking very dumb questions. How do we tune YARN Questions below

 

1. What is the difference between a HDFS block and a input split? Is input split equivalent to a block in Mapreduce. I know the concept of 1 block=1 mapper container but what is the difference between a block and a split. Please explain thoroughly/detailed. 

 

2. How much spilling MR jobs is bad? How many spilled records is bad?

 

3. If we have 128 MB file and my HDFS block size is 64MB then I have 2 blocks which have two mapper containers and those two mapper containers have 1GB allocated to each mapper container and 800MB for heap memory for mapper... Why is there a need of 1GB mapper container to process 64MB of data??? Why is there a need for 800MB of heap size for each 1GB MAPPER container??? 

 

4. What is the when I receive "java.lang.OutOfMemoryError: GC overhead limit exceeded"  for a mapreduce job in the map or reduce phase

 

5. What is a block count issue when I use small files. I know the block sizes or storage isnt being wasted but how does this affect performance of MR jobs 

 

6. SPARK is very confusing as to how it works with YARN. Where do we even begin to fix SPARK jobs on YARN, there is no proper book or documentation as to explaining SPARK architecture

 

 

 

 

2 REPLIES 2

Re: Yarn Container Sizing Mapreduce/Spark

Master Guru
> 1. What is the difference between a HDFS block and a input split? Is input split equivalent to a block in Mapreduce. I know the concept of 1 block=1 mapper container but what is the difference between a block and a split. Please explain thoroughly/detailed.

An InputSplit is a loose guideline to a map task (start reading the first record from around X, stop reading the last record around Y, for an InputSplit of (X, Y)).

A block in HDFS is an arbitrary data boundary (64 mebibytes, 128 mebibytes, etc.), which is how HDFS chunks the files as, but developers don't have to worry about this generally - its abstracted away in the FileSystem APIs for you.

If would be helpful if you go through https://wiki.apache.org/hadoop/HadoopMapReduce completely.

> 2. How much spilling MR jobs is bad? How many spilled records is bad?

Spilling is how we get by sorting large amounts of data with too little available memory. Its not all that bad as much as its also a feature of MR.

Its bad if you spill most of your map input, but some small percentage of spills is OK. Your map should not ideally spend a lot of time spilling - that's your tuning focus.

> 3. If we have 128 MB file and my HDFS block size is 64MB then I have 2 blocks which have two mapper containers and those two mapper containers have 1GB allocated to each mapper container and 800MB for heap memory for mapper... Why is there a need of 1GB mapper container to process 64MB of data??? Why is there a need for 800MB of heap size for each 1GB MAPPER container???

You forget that maps need to sort and partition their input data too (see your own spills question previous to this). Aside of that, a JVM needs heap space to also load up other internal work such as protocol and counter update communications, class loading/etc., and your code, which may hold variables in various ways for various periods during its life. You need all that extra memory to accommodate these needs generically and comfortably.

> 4. What is the when I receive "java.lang.OutOfMemoryError: GC overhead limit exceeded" for a mapreduce job in the map or reduce phase

In general the error simply means your heap was insufficient for the operation you attempted. The GC overhead limit indicates that the JVM had to throw an OOME cause the GC was doing more work than the actual code (70-80% is the threshold).

There's no one-shot answer to this - it greatly depends what your map task code was, what your largest record size is, how much heap you allocated to it, etc. Understanding this or resolving it is no different than debugging a local Java application - figure out why your task is requiring more memory vs. when you do a simpler 1:1 identity map operation.

> 5. What is a block count issue when I use small files. I know the block sizes or storage isnt being wasted but how does this affect performance of MR jobs

The problem is same as having a large amount of small files on any file-system. You spend a lot of time in the overhead of opening and closing files, than you would if you had one or a few larger files. In parallel context, MR divides work based on number of files, and the JVMs take a few seconds of runtime to properly tune their JIT optimisers. If you have tiny files, such as a few kilobytes each, your maps run for shorter durations and in greater overall numbers, adding a lot more JVM startup/spin-down/respawn overhead, than if it were asked to process a single wider block of 128 mebibytes.

HDFS-wise, the DN and NN keep all block metadata and file metadata in-memory, so a lot of tiny files adds more memory consumption when you could instead keep it lower with fewer, more larger files.

We do keep finding ways to optimise the memory representations of these metadata, and CDH5 does a whole lot better than its predecessors, but the data's still up there, and there's still the processing problem (explained earlier).

> 6. SPARK is very confusing as to how it works with YARN. Where do we even begin to fix SPARK jobs on YARN, there is no proper book or documentation as to explaining SPARK architecture

You'll need to be specific here (preferably on its own thread under the Spark board). Spark runs its own work distribution mechanism over YARN, just as MR has its maps and reduces. It runs a "cluster manager" (MR equivalent of what you may call the JobTracker) in the Application Master, and runs its executors as "containers" (MR equivalent of what you may call map or reduce tasks). The docs up at http://spark.apache.org/ do explain how the base mode works, and YARN mode is no different except that the resource scheduling is done by the RM instead of by Spark itself - centralising it into the cluster alongside MR.
Highlighted

Re: Yarn Container Sizing Mapreduce/Spark

Contributor
Hi Zain,
 
In addition to what Harsh has posted, 
 
> Where do we even begin to [run] SPARK jobs on YARN, there is no proper book or documentation
> as to explaining SPARK architecture?
There are a number of resources available for learning about Spark, from Cloudera, 3rd parties and the upstream community.
 
The following link below lists the various resources we offer in terms of use cases, presentations, documentation, books, and training offered: