I understand that classes in org.apache.hadoop.mapred are all part of the "old" map-reduce version 1 API, and that org.apache.hadoop.mapreduce are all part of the new version 2 API. I am running the CDH4.5 pre-built VM, which uses hadoop 2.0.0 and YARN. I want to run a map-reduce job that uses the new APIs, not the old.
So I looked in the javadoc examples in org.apache.hadoop.mapredude.Job, and org.apache.hadoop.util.Tool. The problem is that the example in the Tool javadocs still uses the old org.apache.hadoop.mapred classes. I tried following the example from the Job class, only to discover that all of its constructors were deprecated, and it had no static constructors. It looks like in later javadocs there are static constructors for Job, but in my version there is not.
So what is the new Job building/running pattern? Should I follow the example in Tool? Job? or something else?
If I follow the example in the Tool class, and use the old org.apache.hadoop.mapred classes, does that mean I lose all the advantages that map-reduce version 2 has over version 1? Or is it still leveraging the new architecture behind the scenes?
As per my knowledge, MapReduce V1 and V2 is not about mapred or mapreduce API package, but its all about platform to run the job.
When you run your job in MapReduce V1, the job is going to be managed by JobTracker and TaskTracker and when you run your job in MapReduce V2, the functions of the JobTracker have been split between three services.
Get more detail understanding from below link:
Example and code understanding information is available in below link:
That means the example which you see in Tool class page will still leverage MapReduce V2 new architecture only.
Yes, I believe I know exactly what you're talking about ("mapred" vs "mapreduce" classes). They are both considered part of "map reduce v1".
I found this online course pretty helpful in providing a good template for me to create map, reduce and driver classes using the newer v1 "mapreduce" API: