Created 05-08-2015 08:27 AM
I just switched from Cloudera 4 to Cloudera 5, and the output format for hdfs dfs -du has changed. It now has two columns instead of just one. I'm guessing that the first is the actual content size and the second is the block storage consumption, but I can't find any documentation about this. Can anyone clarify and/or point me the right direction?
Created 05-12-2015 07:57 PM
The link to the source is here
Created 05-10-2015 04:45 PM
Created 05-11-2015 10:26 AM
I'm on Cloudera 5.3.3. Here's my command line and output:
[hadoop]$ hdfs dfs -du /
2298676940886 6896030822658 /output
21297905593 63893716779 /tmp
6072184915396 18216555409976 /user
Created 05-12-2015 07:55 PM
I checked on CDH 5.2 and it was the same as CDH4. The output you see was introduced in CDH 5.3, the jira being HADOOP-6857. The second column is supposed to calculate the raw disk usage of the file/directory.
So if a file is 42 bytes, it correctly shows (file size) * (replication factor of file)
$ hadoop fs -du /hbase/hbase.id
42 126 /hbase/hbase.id
However it seems incorrect for directories as column 2 shows (theoretical block size) * (replication factor of file) * (number of files in directory)
$ hadoop fs -du -h /hbase/WALs
0 0 /hbase/WALs/hregion-67213513
83 384 M /hbase/WALs/server54-2.abc.cloudera.com,22101,1431434548533
166 768 M /hbase/WALs/server54-3.abc.cloudera.com,22101,1431434548804
83 384 M /hbase/WALs/server54-4.abc.cloudera.com,22101,1431434547067
$ hadoop fs -ls -R /hbase/WALs/server54-2.abc.cloudera.com,22101,1431434548533
-rw-r--r-- 3 hbase hbase 83 2015-05-12 18:42 /hbase/WALs/server54-2.abc.cloudera.com,22101,1431434548533/server54-2.abc.cloudera.com%2C22101%2C1431434548533.default.1431481363257
$ hadoop fs -ls -R /hbase/WALs/server54-3.abc.cloudera.com,22101,1431434548804
-rw-r--r-- 3 hbase hbase 83 2015-05-12 19:42 /hbase/WALs/server54-3.abc.cloudera.com,22101,1431434548804/server54-3.abc.cloudera.com%2C22101%2C1431434548804..meta.1431484974652.meta
-rw-r--r-- 3 hbase hbase 83 2015-05-12 19:42 /hbase/WALs/server54-3.abc.cloudera.com,22101,1431434548804/server54-3.abc.cloudera.com%2C22101%2C1431434548804.default.1431484963607
Created 05-12-2015 07:57 PM
The link to the source is here