Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How Map and Reduce operations are actually carried out

Solved Go to solution

How Map and Reduce operations are actually carried out

Explorer

Hi,

 

I have gone through this question., can anyone pls tell me the correct answer with explanation?

 

Which best describes how TextInputFormat processes input files and line breaks?


A. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the beginning of the broken line.
B. Input file splits may cross line breaks. A line that crosses file splits is read by the
RecordReaders of both splits containing the broken line.
C. The input file is split exactly at the line breaks, so each RecordReader will read a series of
complete lines.
D. Input file splits may cross line breaks. A line that crosses file splits is ignored.
E. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader
of the split that contains the end of the broken line.

 

 

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How Map and Reduce operations are actually carried out

Master Guru
Check http://stackoverflow.com/a/14540272 perhaps, which includes an
example.

8 REPLIES 8

Re: How Map and Reduce operations are actually carried out

Master Guru

Re: How Map and Reduce operations are actually carried out

Explorer
Hi Harish,

Thanks for the reply.
I had gone through the same link. But can you please explain me the same, I couldn't understand.

Re: How Map and Reduce operations are actually carried out

Master Guru
What part of it was not clear specifically? Could you quote, so it can be
explained further?

Re: How Map and Reduce operations are actually carried out

Explorer
" For example TextInputFormat will read the last line of the FileSplit past the split boundary and, when reading other than the first FileSplit, TextInputFormat ignores the content up to the first newline." .. what it means?

Re: How Map and Reduce operations are actually carried out

Master Guru
It just means that when the split offset (starting point) is 0, i.e. start
of file, we read the first line. Otherwise (non-zero offsets/starting
point/mid-points of file) we arbitrarily skip the first line, because we
know that the previous split's reader reads always an extra line at the end.

Does this help?

Re: How Map and Reduce operations are actually carried out

Explorer
Sorry harish, can you please explain in detail with small example.

Re: How Map and Reduce operations are actually carried out

Master Guru
Check http://stackoverflow.com/a/14540272 perhaps, which includes an
example.

Highlighted

Re: How Map and Reduce operations are actually carried out

Explorer
Thanks Harish