Reply
Highlighted
Explorer
Posts: 7
Registered: ‎02-03-2017

HDFS disk usage doubled after upgrade

 

I've done an upgrade to Cloudera Manager from 5.5.3 to 5.10.0 then upgraded CDH from 5.5.1 to 5.8.4. After these operations, I saw the disk usages of all DataNodes on Hosts->All Hosts page increased. On HDFS file browser and with CLI commands I see almost every directory has double the size before, but I noticed no difference among the file counts, types, names etc.. Same thing when I also check disk usage on Linux terminal. I am a little bit confused and need help to figure out what happened.

Champion
Posts: 176
Registered: ‎05-16-2016

Re: HDFS disk usage doubled after upgrade

[ Edited ]

Curious to know  whether Reinstalling the same Cloudera Manager Server version that you were previously running

solved the issue ? 

Explorer
Posts: 7
Registered: ‎02-03-2017

Re: HDFS disk usage doubled after upgrade

I haven't tried that, and probably would not be able to.
Explorer
Posts: 7
Registered: ‎02-03-2017

Re: HDFS disk usage doubled after upgrade

@csguna

 I haven't tried that, and probably would not be able to.

 

 

Explorer
Posts: 7
Registered: ‎02-03-2017

Re: HDFS disk usage doubled after upgrade

[ Edited ]

An update: I was mistaken on some values.

 

The size values on HDFS file browser and returning from hdfs dfsadmin -report are supposed values. But Cloudera metrics & charts countinue to give increasing values. du -sch output on dfs folders in Linux terminal also gives big numbers. And I noticed the increase have started a couple of days before the upgrade I mentioned, so it's not likely something went wrong with the upgrade.

 

Recently we have been informed by another HDFS user that they have been splitting the large files into smaller ones for computing performance increase(??) which had me thinking if they're splitting the combined size of TBs of data into smaller ones mostly even smaller than Block Size (128MB) and causing usage on the file system grow more than 3x.

 

Am I correct on this estimation?

Explorer
Posts: 11
Registered: ‎03-25-2017

Re: HDFS disk usage doubled after upgrade

I guess breaking large files into smaller one should not result in space increase because hadoop blocks allows you to use every bit of  free space available in a block.

 

For example:

if a block is of 128MB and only 50MB is occupied in it.Hadoop block allows you to use rest of the unused 78MB to store some other data.

Announcements