Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How hadoop handles line record when it spans block boundary?

Explorer

Hello Forum, I have read the following statement in http://www.dummies.com/programming/big-data/hadoop/input-splits-in-hadoops-mapreduce/

"In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record".

I would like to know that is this statement true? Thanks

1 REPLY 1

Yes @Saravanan Selvam. If the record is large and if it can't fit into a split file then broken record will be created and placed in the new split file. Also it depends on the compression codec available in HDFS. Inside hadoop there are multiple ways of compressing a file like record compressed and block compressed. However the sync marker will be available to identify the record beginning and end. These record splits are handled by clients by InputFormat.getSplits.

I came across a brief and clear explanation same kind of question. Please do check it.

https://stackoverflow.com/questions/14291170/how-does-hadoop-process-records-split-across-block-boun...

Hope it Helps!!