Support Questions

gowrisankar_bsc · ‎04-19-2016

Ex: 300 MB data . After split -> 128MB + 128MB + 44MB . My question , the thrid block 44MB will wait to receive 84MB data or it will find the free block and write 44MB data in datanode ?

manish1 · ‎04-19-2016

A HDFS block corresponds to 1 file in local file system on a datanode. So regardless of data-size, all the data will be broken into 128 MB data-files stored into local file system by default. The last chunk 84 MB will be also written to a new data-file. So you will find following block files (data-files) in your datanode local file system:

Here is an example:

/haadoop/data/dfs/data/

├── current

│ ├── BP-1079595417-192.168.2.45-1412613236271

│ │ ├── current

│ │ │ ├── VERSION

│ │ │ ├── finalized

│ │ │ │ └── subdir0

│ │ │ │ └── subdir1

│ │ │ │ ├── blk_1073741825 (128 MB)

│ │ │ │ ├── blk_1073741826 (128 MB)

│ │ │ │ ├── blk_1073741827 (84 MB)

Look for 'dfs.datanode.data.dir' property in HDFS configuration. It tells where these files (which represent HDFS blocks) are located on a datanode local file system.

View solution in original post

manish1 · ‎04-19-2016

A HDFS block corresponds to 1 file in local file system on a datanode. So regardless of data-size, all the data will be broken into 128 MB data-files stored into local file system by default. The last chunk 84 MB will be also written to a new data-file. So you will find following block files (data-files) in your datanode local file system:

Here is an example:

/haadoop/data/dfs/data/

├── current

│ ├── BP-1079595417-192.168.2.45-1412613236271

│ │ ├── current

│ │ │ ├── VERSION

│ │ │ ├── finalized

│ │ │ │ └── subdir0

│ │ │ │ └── subdir1

│ │ │ │ ├── blk_1073741825 (128 MB)

│ │ │ │ ├── blk_1073741826 (128 MB)

│ │ │ │ ├── blk_1073741827 (84 MB)

Look for 'dfs.datanode.data.dir' property in HDFS configuration. It tells where these files (which represent HDFS blocks) are located on a datanode local file system.

emaxwell · ‎04-19-2016

@Gowrisankar Periyasamy

HDFS allocates space in blocks at a time and a block belongs to a file. If you have a file that takes up a partial block at the end, then that block (and its replicas) remain unfilled until an append is done to the file. If you append to the file, then the last block of the file (and its replicas) is used to hold the appended data until the block is full.

For very large files (which is mostly why people use Hadoop), having a max of <blocksize>MB (plus replicas) of space unused is not too large of a concern. For example, if you have a 99.9GB file, you would allocate 799 full blocks (at 128MB/block) and have one block that was only 20% full. That equates to about 0.1% unused space for that file.

gowrisankar_bsc · ‎04-19-2016

Emaxwell , As per my example , it will write 44MB data and it will append the data whenever the client request come back again right ?

If the client mayn't come back , the unused 84MB will be wasted or not used . Is that my understanding correct ?

manish1 · ‎04-19-2016

No it will not append data. Instead it will create a new block aka a new data file in Datanode local file system.

jyadav · ‎04-19-2016

@Gowrisankar Periyasamy

As per design it won't wait for next 84MB data, it will directly write the 44 MB block. These blocks are referred as logical entity and internally it usage underlining ext3/ext4 disk blocks to write.

gowrisankar_bsc · ‎04-19-2016

As per my understanding , whatever data comes from client , it's splits data and started to write in datanode whenever it's find the free block from datanode . It won't append any data which is left over last time . Please correct me if my understanding is wrong .

manish1 · ‎04-19-2016

this is kind of old, but will give you a clear picture: http://www.formhadoop.es/img/HDFS-comic.pdf

Enjoy.

Cloudera Community

Support Questions

I have question related to writing data in datanode .