Support Questions
Find answers, ask questions, and share your expertise

Disk size used is bigger than replication number multiplied by files size

Explorer

Hi,

I am running Hadoop on a 3 nodes cluster (3 virtual machines) with respectively 20Gb, 10Gb and 10Gb of disk space available.

When I run this command on the namenode :

hadoop fs -df -h /

I get the following result :

13803-1.png

When I run this command :

hadoop fs -du -s -h /

I get the following result :

13804-2.png

Knowing that the replication number is set to 3, shouldn't I get 3*2,7 = 8,1G in the first screenshot ?

I tried to execute expunge command and it did not change the result.

Thanks in advance !

Sylvain.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Disk size used is bigger than replication number multiplied by files size

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.

View solution in original post

3 REPLIES 3

Re: Disk size used is bigger than replication number multiplied by files size

Contributor

Can you please check if the screenshots are uploaded properly because it is not seen on this end.

Re: Disk size used is bigger than replication number multiplied by files size

Explorer

It should be alright now.

Re: Disk size used is bigger than replication number multiplied by files size

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing 🙂

+++

If this is helpful please vote and accept as the best answer.

View solution in original post