How hadoop handles line record when it spans block boundary?


Hello Forum, I have read the following statement in

"In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record".

I would like to know that is this statement true? Thanks


Yes @Saravanan Selvam. If the record is large and if it can't fit into a split file then broken record will be created and placed in the new split file. Also it depends on the compression codec available in HDFS. Inside hadoop there are multiple ways of compressing a file like record compressed and block compressed. However the sync marker will be available to identify the record beginning and end. These record splits are handled by clients by InputFormat.getSplits.

I came across a brief and clear explanation same kind of question. Please do check it.

Hope it Helps!!