Need some help. When hdfs du -s h / command is executed i see the below user has 12 T of trash but when checking the list of files its empty. please find the below:
bash-4.2$ hadoop fs -ls /user/mzhou1/.Trash
-bash-4.2$ hadoop fs -du -s /user/mzhou1/.Trash
-bash-4.2$ hadoop fs -ls /user/mzhou1/.Trash
It's not only with this user there are other directories also facing similar issue.. is it because of the snapshot stored in hdfs wherein blocks are allocated. Please provide inputs
thanks for the reply,
but i don't see snapshot for the below mentioned user id
-bash-4.2$ hdfs dfs -du -h /user/alin/
1.3 T /user/alin/.Trash
40.5 M /user/alin/.hiveJars
-bash-4.2$ hdfs dfs -du -h /user/alin/.snapshot
du: `/user/alin/.snapshot': No such file or directory
-bash-4.2$ hdfs dfs -ls /user/alin/.snapshot ls: `/user/alin/.snapshot': No such file or directory
I have extracted fsimage and using ELK to see the storage used by hdfs, the space consumption of certain directories don;t match with the actual size on the cluster.
Eg: /user/alin is consuming 1 TB of storage according hdfs dfs du command but in fsimage it shows that same user is consuming 40.5 M of storage.... on performing hdfs ls command on /user/alin i dont see any files of 1 TB
As checked with my operations they say because of hdfs snapshot saved on the cluster it gives those values as blocks are still allocated..but i don't see any doc in hortonworks mentioning that...
If thats the case how do i calculate the actual storage used by HDFS? does fsimage give accurate data
how to make sure fsimage also get snapshot details ?