Support Questions
Find answers, ask questions, and share your expertise

Disk size used is bigger than replication number multiplied by files size

Explorer

Hi,

I am running Hadoop on a 3 nodes cluster (3 virtual machines) with respectively 20Gb, 10Gb and 10Gb of disk space available.

When I run this command on the namenode :

hadoop fs -df -h /

I get the following result :

13803-1.png

When I run this command :

hadoop fs -du -s -h /

I get the following result :

13804-2.png

Knowing that the replication number is set to 3, shouldn't I get 3*2,7 = 8,1G in the first screenshot ?

I tried to execute expunge command and it did not change the result.

Thanks in advance !

Sylvain.

1 ACCEPTED SOLUTION

Accepted Solutions

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.

View solution in original post

3 REPLIES 3

Contributor

Can you please check if the screenshots are uploaded properly because it is not seen on this end.

Explorer

It should be alright now.

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.

View solution in original post