03-20-2017 02:16 AM
I've done an upgrade to Cloudera Manager from 5.5.3 to 5.10.0 then upgraded CDH from 5.5.1 to 5.8.4. After these operations, I saw the disk usages of all DataNodes on Hosts->All Hosts page increased. On HDFS file browser and with CLI commands I see almost every directory has double the size before, but I noticed no difference among the file counts, types, names etc.. Same thing when I also check disk usage on Linux terminal. I am a little bit confused and need help to figure out what happened.
03-21-2017 12:39 AM - edited 03-21-2017 12:51 AM
Curious to know whether Reinstalling the same Cloudera Manager Server version that you were previously running
solved the issue ?
03-24-2017 04:45 AM - edited 03-24-2017 06:40 AM
An update: I was mistaken on some values.
The size values on HDFS file browser and returning from hdfs dfsadmin -report are supposed values. But Cloudera metrics & charts countinue to give increasing values. du -sch output on dfs folders in Linux terminal also gives big numbers. And I noticed the increase have started a couple of days before the upgrade I mentioned, so it's not likely something went wrong with the upgrade.
Recently we have been informed by another HDFS user that they have been splitting the large files into smaller ones for computing performance increase(??) which had me thinking if they're splitting the combined size of TBs of data into smaller ones mostly even smaller than Block Size (128MB) and causing usage on the file system grow more than 3x.
Am I correct on this estimation?