Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

New MapReduce Job creation Pattern

Highlighted

New MapReduce Job creation Pattern

I understand that classes in org.apache.hadoop.mapred are all part of the "old" map-reduce version 1 API, and that org.apache.hadoop.mapreduce are all part of the new version 2 API.  I am running the CDH4.5 pre-built VM, which uses hadoop 2.0.0 and YARN.  I want to run a map-reduce job that uses the new APIs, not the old.

 

So I looked in the javadoc examples in org.apache.hadoop.mapredude.Job, and org.apache.hadoop.util.Tool.  The problem is that the example in the Tool javadocs still uses the old org.apache.hadoop.mapred classes.  I tried following the example from the Job class, only to discover that all of its constructors were deprecated, and it had no static constructors.  It looks like in later javadocs there are static constructors for Job, but in my version there is not.

 

So what is the new Job building/running pattern?  Should I follow the example in Tool?  Job?  or something else?

 

If I follow the example in the Tool class, and use the old org.apache.hadoop.mapred classes, does that mean I lose all the advantages that map-reduce version 2 has over version 1?  Or is it still leveraging the new architecture behind the scenes?

2 REPLIES 2

Re: New MapReduce Job creation Pattern

Hi Michael,

 

As per my knowledge, MapReduce V1 and V2 is not about mapred or mapreduce API package, but its all about platform to run the job.
When you run your job in MapReduce V1, the job is going to be managed by JobTracker and TaskTracker and when you run your job in MapReduce V2,  the functions of the JobTracker have been split between three services.

 

  1. ResourceManager - 
    • The ResourceManager is a persistent YARN service that receives and runs MapReduce job on the cluster. It contains the scheduler, which, as previously, is pluggable.
  2. MapReduce Application Master - 
    • The MapReduce-specific capabilities of the JobTracker have been moved into the MapReduce Application Master, one of which is started to manage each MapReduce job and terminated when the job completes.
  3. JobHistoryServer - 
    • The JobTracker’s function of serving information about completed jobs has been moved to the JobHistoryServer.

Get more detail understanding from below link:

https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/...

 

Example and code understanding information is available in below link:

http://hadoop.apache.org/docs/stable1/mapred_tutorial.html

 

That means the example which you see in Tool class page will still leverage MapReduce V2 new architecture only.

Regards,
Chirag Patadia.

Re: New MapReduce Job creation Pattern

New Contributor

Yes, I believe I know exactly what you're talking about ("mapred" vs "mapreduce" classes).  They are both considered part of "map reduce v1".

 

I found this online course pretty helpful in providing a good template for me to create map, reduce and driver classes using the newer v1 "mapreduce" API:

http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-to-mapreduce-programming/