Reply
Explorer
Posts: 38
Registered: ‎09-29-2016

HDFS du and fsck command shows different storage value

I drop partitions for manage table then check the size of the table. I run below command to check.

hdfs du and hdfs fsck command shows different value of storage size

du t_table show 7TB

du t_table for all the partition of the table shows 600gb

fsck t_table show 600gb

 

Why it shows different values

 

hdfs dfs -du -s -h /data/warehouse/developer.db/t_table/
7.0 T 21.1 T /data/warehouse/developer.db/t_table

hdfs dfs -du -s -h /data/warehouse/developer.db/t_table/*
31.0 G 93.0 G /data/warehouse/developer.db/t_table/day_string=2019-01-01
30.9 G 92.7 G /data/warehouse/developer.db/t_table/day_string=2019-01-02
31.0 G 92.9 G /data/warehouse/developer.db/t_table/day_string=2019-01-03
31.0 G 93.0 G /data/warehouse/developer.db/t_table/day_string=2019-01-04
31.0 G 92.9 G /data/warehouse/developer.db/t_table/day_string=2019-01-05
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-06
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-07
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-08
31.2 G 93.6 G /data/warehouse/developer.db/t_table/day_string=2019-01-09
31.0 G 93.1 G /data/warehouse/developer.db/t_table/day_string=2019-01-10
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-11
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-12
31.1 G 93.3 G /data/warehouse/developer.db/t_table/day_string=2019-01-13
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-14
31.2 G 93.5 G /data/warehouse/developer.db/t_table/day_string=2019-01-15
31.1 G 93.2 G /data/warehouse/developer.db/t_table/day_string=2019-01-16
31.2 G 93.5 G /data/warehouse/developer.db/t_table/day_string=2019-01-17
31.1 G 93.4 G /data/warehouse/developer.db/t_table/day_string=2019-01-18

hdfs fsck /data/warehouse/developer.db/t_table/
Total size: 600360486250 B
Total dirs: 19
Total files: 1660

Explorer
Posts: 38
Registered: ‎09-29-2016

Re: HDFS du and fsck command shows different storage value

Does anyone have any idea?

Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: HDFS du and fsck command shows different storage value

FSCK will only count live version path elements, whereas DU will also count
(by default, without -x passed) all snapshots living under the same path.

Check if this observed path is snapshotted via 'hdfs lsSnapshottableDir'.
See also
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
for more info on how to use and control snapshots.
Highlighted
Explorer
Posts: 38
Registered: ‎09-29-2016

Re: HDFS du and fsck command shows different storage value

Thank you for your reply!!

hdfs lsSnapshottableDir 
shows nothing
hdfs dfs -ls /data/warehouse/developer.db/t_table/.snapshot
No such file

The table is an external table. I alter the table to manage table then drop the partitions. I expect it will drop data and metadata together. But It shows the above issues. du the table shows old size 7TB. du all the partition shows correct size 600GB after drop partitions. Fsck shows correct value as well.

Explorer
Posts: 38
Registered: ‎09-29-2016

Re: HDFS du and fsck command shows different storage value

Looks like snapshot cause the issue 

hdfs dfs -du -s -h -x /data/warehouse/developer.db/t_table/

shows correct value 

hdfs lsSnapshottableDir --Get all the snapshottable directories where the current user has permission to take snapshtos.
shows nothing might be i dont have permission

 

Explorer
Posts: 38
Registered: ‎09-29-2016

Re: HDFS du and fsck command shows different storage value

Why drop partition doesn't remove the snapshot? 

Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: HDFS du and fsck command shows different storage value

Snapshot removal can only be done explicitly, and Hive/Impala DROP
functions do not cover Snapshot operations.

Your question of 'why is the snapshot not removed' must be directed to the
team/person in your org. making these snapshots, because it is not an
automatic feature of Hive/Impala/etc., but has to be done manually instead
(or via DistCp-like operations when explicitly indicated).

If you use Cloudera Manager (Enterprise), then BDR lets you schedule
snapshot creation and deletion, so you can control the overall data
retention over time:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_snapshot_intro.html
Explorer
Posts: 38
Registered: ‎09-29-2016

Re: HDFS du and fsck command shows different storage value

That's really helpful !! Thank you for your reply 

Announcements