Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS du and fsck command shows different storage value

HDFS du and fsck command shows different storage value

Contributor

I drop partitions for manage table then check the size of the table. I run below command to check.

hdfs du and hdfs fsck command shows different value of storage size

du t_table show 7TB

du t_table for all the partition of the table shows 600gb

fsck t_table show 600gb

 

Why it shows different values

 

hdfs dfs -du -s -h /data/warehouse/developer.db/t_table/
7.0 T 21.1 T /data/warehouse/developer.db/t_table

hdfs dfs -du -s -h /data/warehouse/developer.db/t_table/*
31.0 G 93.0 G /data/warehouse/developer.db/t_table/day_string=2019-01-01
30.9 G 92.7 G /data/warehouse/developer.db/t_table/day_string=2019-01-02
31.0 G 92.9 G /data/warehouse/developer.db/t_table/day_string=2019-01-03
31.0 G 93.0 G /data/warehouse/developer.db/t_table/day_string=2019-01-04
31.0 G 92.9 G /data/warehouse/developer.db/t_table/day_string=2019-01-05
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-06
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-07
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-08
31.2 G 93.6 G /data/warehouse/developer.db/t_table/day_string=2019-01-09
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-10
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-11
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-12
31.1 G 93.3 G /data/warehouse/developer.db/t_table/day_string=2019-01-13
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-14
31.2 G 93.5 G /data/warehouse/developer.db/t_table/day_string=2019-01-15
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-16
31.2 G 93.5 G /data/warehouse/developer.db/t_table/day_string=2019-01-17
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-18

hdfs fsck /data/warehouse/developer.db/t_table/
Total size: 600360486250 B
Total dirs: 19
Total files: 1660

7 REPLIES 7

Re: HDFS du and fsck command shows different storage value

Contributor

Does anyone have any idea?

Re: HDFS du and fsck command shows different storage value

Master Guru
FSCK will only count live version path elements, whereas DU will also count
(by default, without -x passed) all snapshots living under the same path.

Check if this observed path is snapshotted via 'hdfs lsSnapshottableDir'.
See also
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
for more info on how to use and control snapshots.

Re: HDFS du and fsck command shows different storage value

Contributor

Thank you for your reply!!

hdfs lsSnapshottableDir 
shows nothing
hdfs dfs -ls /data/warehouse/developer.db/t_table/.snapshot
No such file

The table is an external table. I alter the table to manage table then drop the partitions. I expect it will drop data and metadata together. But It shows the above issues. du the table shows old size 7TB. du all the partition shows correct size 600GB after drop partitions. Fsck shows correct value as well.

Re: HDFS du and fsck command shows different storage value

Contributor

Looks like snapshot cause the issue 

hdfs dfs -du -s -h -x /data/warehouse/developer.db/t_table/

shows correct value 

hdfs lsSnapshottableDir --Get all the snapshottable directories where the current user has permission to take snapshtos.
shows nothing might be i dont have permission

 

Re: HDFS du and fsck command shows different storage value

Contributor

Why drop partition doesn't remove the snapshot? 

Re: HDFS du and fsck command shows different storage value

Master Guru
Snapshot removal can only be done explicitly, and Hive/Impala DROP
functions do not cover Snapshot operations.

Your question of 'why is the snapshot not removed' must be directed to the
team/person in your org. making these snapshots, because it is not an
automatic feature of Hive/Impala/etc., but has to be done manually instead
(or via DistCp-like operations when explicitly indicated).

If you use Cloudera Manager (Enterprise), then BDR lets you schedule
snapshot creation and deletion, so you can control the overall data
retention over time:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_snapshot_intro.html

Re: HDFS du and fsck command shows different storage value

Contributor

That's really helpful !! Thank you for your reply 

Don't have an account?
Coming from Hortonworks? Activate your account here