hdfs disk usage issues

Hi All,

Need some help. When hdfs du -s h / command is executed i see the below user has 12 T of trash but when checking the list of files its empty. please find the below:

bash-4.2$ hadoop fs -ls /user/mzhou1/.Trash

-bash-4.2$ hadoop fs -du -s /user/mzhou1/.Trash

12248499878795 /user/mzhou1/.Trash

-bash-4.2$ hadoop fs -ls /user/mzhou1/.Trash


It's not only with this user there are other directories also facing similar issue.. is it because of the snapshot stored in hdfs wherein blocks are allocated. Please provide inputs


@sudi ts

Yes, there might be Snapshot on that particular directory. Please delete all the snapshot of that particular directory and try after checkpoint.

Hope this helps you.


thanks for the reply,

but i don't see snapshot for the below mentioned user id

-bash-4.2$ hdfs dfs -du -h /user/alin/

1.3 T /user/alin/.Trash

40.5 M /user/alin/.hiveJars

-bash-4.2$ hdfs dfs -du -h /user/alin/.snapshot

du: `/user/alin/.snapshot': No such file or directory

-bash-4.2$ hdfs dfs -ls /user/alin/.snapshot ls: `/user/alin/.snapshot': No such file or directory

I have extracted fsimage and using ELK to see the storage used by hdfs, the space consumption of certain directories don;t match with the actual size on the cluster.

Eg: /user/alin is consuming 1 TB of storage according hdfs dfs du command but in fsimage it shows that same user is consuming 40.5 M of storage.... on performing hdfs ls command on /user/alin i dont see any files of 1 TB

As checked with my operations they say because of hdfs snapshot saved on the cluster it gives those values as blocks are still allocated..but i don't see any doc in hortonworks mentioning that...

If thats the case how do i calculate the actual storage used by HDFS? does fsimage give accurate data

how to make sure fsimage also get snapshot details ?

@sudi ts

run the following:

hadoop fs -ls /user/mzhou1/.snapshot

then also run

fs -du -s /user/mzhou1/.snapshot

