Support Questions

Find answers, ask questions, and share your expertise

Disk size used is bigger than replication number multiplied by files size

avatar
Contributor

Hi,

I am running Hadoop on a 3 nodes cluster (3 virtual machines) with respectively 20Gb, 10Gb and 10Gb of disk space available.

When I run this command on the namenode :

hadoop fs -df -h /

I get the following result :

13803-1.png

When I run this command :

hadoop fs -du -s -h /

I get the following result :

13804-2.png

Knowing that the replication number is set to 3, shouldn't I get 3*2,7 = 8,1G in the first screenshot ?

I tried to execute expunge command and it did not change the result.

Thanks in advance !

Sylvain.

1 ACCEPTED SOLUTION

avatar
Super Guru

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.

View solution in original post

3 REPLIES 3

avatar
Rising Star

Can you please check if the screenshots are uploaded properly because it is not seen on this end.

avatar
Contributor

It should be alright now.

avatar
Super Guru

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.