Support Questions

kk778j · ‎08-30-2016

I have developed MapReduce job which has - - One input split and thus one map task - map i/p key is Path and value is BytesWritable (one min log file) - map o/p key is Text and value is also Text (Record from min file) - I have configured no of reducers to 1

I know map outputs will get sorted based on natural order of key, but what will be the order for the records having same key? Will it be based on First-In-First-Out? Lets say for same key map emits output id FIFO basis, will that same order get preserved when it comes to reducer ? Also, if you can let me know if this behavior is same in hadoop 1.X and 2.X?

Note: I have not implemented Secondary sort.

cstanca · ‎12-28-2016

For equal rowkey, random, otherwise, sorted by rowkey as a String. Same behavior in Hadoop 1.X and 2.X.

View solution in original post

cstanca · ‎12-28-2016

For equal rowkey, random, otherwise, sorted by rowkey as a String. Same behavior in Hadoop 1.X and 2.X.

Cloudera Community

Support Questions

Hadoop MapReduce sorting order

hadoop files list sorted by time

How can I sort record in parquet file?

ALL hadoop-mapreduce-examples.jar fail cdh6

/usr/lib/hadoop-mapreduce/hadoop-streamingxxxx.jar...

What is the difference between Partitioner, Combin...

Order by Operator in Pig

Reading ORC files using Mapreduce

Spark DataFrame - difference between sort and orde...

Hadoop Security Concepts

Join Order from explain plan