Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Disk size used is bigger than replication number multiplied by files size

Solved Go to solution

Disk size used is bigger than replication number multiplied by files size

Explorer

Hi,

I am running Hadoop on a 3 nodes cluster (3 virtual machines) with respectively 20Gb, 10Gb and 10Gb of disk space available.

When I run this command on the namenode :

hadoop fs -df -h /

I get the following result :

13803-1.png

When I run this command :

hadoop fs -du -s -h /

I get the following result :

13804-2.png

Knowing that the replication number is set to 3, shouldn't I get 3*2,7 = 8,1G in the first screenshot ?

I tried to execute expunge command and it did not change the result.

Thanks in advance !

Sylvain.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Disk size used is bigger than replication number multiplied by files size

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing :)

+++

If this is helpful please vote and accept as the best answer.

View solution in original post

3 REPLIES 3
Highlighted

Re: Disk size used is bigger than replication number multiplied by files size

Contributor

Can you please check if the screenshots are uploaded properly because it is not seen on this end.

Highlighted

Re: Disk size used is bigger than replication number multiplied by files size

Explorer

It should be alright now.

Highlighted

Re: Disk size used is bigger than replication number multiplied by files size

@dvt isoft

Not necessarily. That would be only if your blocks will be 100% filled with data.

Let's say you have a 1024 MB file and the block size is 128 MB. That would be exactly 8 blocks at 100%.

Let's say you have 968 MB file and the block size is128 MB. That is still 8 blocks but with lower usage. A block once used by a file cannot be reused for a different file.

That's why loading small files could be a waste.

Just imagine 100 files of each 100 KB will be using 100 blocks for 128 MB, 10x more than the examples I provided above.

You need to understand your files, block % usage etc.

The command you execute shows the blocks empty x size/block ... I know that is confusing :)

+++

If this is helpful please vote and accept as the best answer.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here