Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I have question related to writing data in datanode .

avatar
New Contributor

Ex: 300 MB data . After split -> 128MB + 128MB + 44MB . My question , the thrid block 44MB will wait to receive 84MB data or it will find the free block and write 44MB data in datanode ?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

A HDFS block corresponds to 1 file in local file system on a datanode. So regardless of data-size, all the data will be broken into 128 MB data-files stored into local file system by default. The last chunk 84 MB will be also written to a new data-file. So you will find following block files (data-files) in your datanode local file system:

Here is an example:

/haadoop/data/dfs/data/

├── current

│ ├── BP-1079595417-192.168.2.45-1412613236271

│ │ ├── current

│ │ │ ├── VERSION

│ │ │ ├── finalized

│ │ │ │ └── subdir0

│ │ │ │ └── subdir1

│ │ │ │ ├── blk_1073741825 (128 MB)

│ │ │ │ ├── blk_1073741826 (128 MB)

│ │ │ │ ├── blk_1073741827 (84 MB)

Look for 'dfs.datanode.data.dir' property in HDFS configuration. It tells where these files (which represent HDFS blocks) are located on a datanode local file system.

View solution in original post

7 REPLIES 7

avatar
Super Collaborator

A HDFS block corresponds to 1 file in local file system on a datanode. So regardless of data-size, all the data will be broken into 128 MB data-files stored into local file system by default. The last chunk 84 MB will be also written to a new data-file. So you will find following block files (data-files) in your datanode local file system:

Here is an example:

/haadoop/data/dfs/data/

├── current

│ ├── BP-1079595417-192.168.2.45-1412613236271

│ │ ├── current

│ │ │ ├── VERSION

│ │ │ ├── finalized

│ │ │ │ └── subdir0

│ │ │ │ └── subdir1

│ │ │ │ ├── blk_1073741825 (128 MB)

│ │ │ │ ├── blk_1073741826 (128 MB)

│ │ │ │ ├── blk_1073741827 (84 MB)

Look for 'dfs.datanode.data.dir' property in HDFS configuration. It tells where these files (which represent HDFS blocks) are located on a datanode local file system.

avatar

@Gowrisankar Periyasamy

HDFS allocates space in blocks at a time and a block belongs to a file. If you have a file that takes up a partial block at the end, then that block (and its replicas) remain unfilled until an append is done to the file. If you append to the file, then the last block of the file (and its replicas) is used to hold the appended data until the block is full.

For very large files (which is mostly why people use Hadoop), having a max of <blocksize>MB (plus replicas) of space unused is not too large of a concern. For example, if you have a 99.9GB file, you would allocate 799 full blocks (at 128MB/block) and have one block that was only 20% full. That equates to about 0.1% unused space for that file.

avatar
New Contributor

Emaxwell , As per my example , it will write 44MB data and it will append the data whenever the client request come back again right ?

If the client mayn't come back , the unused 84MB will be wasted or not used . Is that my understanding correct ?

avatar
Super Collaborator

No it will not append data. Instead it will create a new block aka a new data file in Datanode local file system.

avatar
Super Guru

@Gowrisankar Periyasamy

As per design it won't wait for next 84MB data, it will directly write the 44 MB block. These blocks are referred as logical entity and internally it usage underlining ext3/ext4 disk blocks to write.

avatar
New Contributor

As per my understanding , whatever data comes from client , it's splits data and started to write in datanode whenever it's find the free block from datanode . It won't append any data which is left over last time . Please correct me if my understanding is wrong .

avatar
Super Collaborator

this is kind of old, but will give you a clear picture: http://www.formhadoop.es/img/HDFS-comic.pdf

Enjoy.