Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎06-19-2015

Hadoop input split for a compressed block

If i have a compressed file of 1GB which is splittable and by default the block size and input split size is 128MB then there are 8 blocks created and 8 input split. When the compressed block is read by map reduce it is uncompressed and say after uncompression the size of the block bbecomes 200MB. But the input split for this assigned is of 128MB, so how is the rest of the 82MB processed.

 

1. Is it processed by the next input split?

2. Is the same input split size is increased?

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Hadoop input split for a compressed block

The input split size is just a unit of work division. A task is not bound by any limits preventing it to read more/less.

I/O-wise you are still reading 128 MB, but are expanding those bytes in-memory when decompressing.

Among your choices, (2) is closer to what the concept is.