01-12-2015 07:55 AM - last edited on 01-13-2015 09:41 AM by Clint
I am planning to take the certification next month and I wanted to understand if the questions were based on MR1 or MR2 ?
01-15-2015 08:07 PM
I would say MRv2 with the following caveat: it doesn't really matter to a developer since there is very little difference.
MRv1 is MapReduce tightly-coupled to the Hadoop processing layer. YARN rewrote those internals to allow multiple frameworks (Spark, Giraph, etc.).
Of course, MapReduce was also ported to YARN, so developers write and run MapReduce processing code exactly like they did before. This is true of Hive and Pig as well; in any event, the underlying daemons and configuration changes are not visible or relevant to developers. The only developers for whom MRv2 is different are developers who are writing new frameworks for YARN (rather than writing code that runs on an existing framework in YARN, like MapReduce or Giraph), and this exam does not address that audience.
So there's no difference in the code. We use all the daemons (HDFS, MRv1, MRv2) in various questions so you should be familiar with the JobTracker for example which is a decidedly MRv1 MapReduce daemon, but just knowing what it is and why it's gone or not is enough. Again, what matters is the code and that doesn't change.
Related is the older "old api versus new api" and this exam is (and has been for a few years) new api.