Created 07-21-2016 05:16 AM
I need to find the files/directories that are occupying the maximum space in HDFS. so that i can ask the owners of those directories to clean up the spaces. what is the command that can provide the results for this purpose. Also i do notice the namenode heap size is increasing and is around 80 - 90%. IS there a way to find which files cause this . i.e which directory has too many small files or how to identify the small files which might cause this .
Or is there any other reason thaat is causing the namenode heapsize to increase.
My name node memory is 3 GB.
Created 07-21-2016 06:10 AM
1) Identifying directories usage in HDFS
Try to run below for each directory to know the usage
[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>
50.9 M /abc
[root@adc01 ~]#
2) namenode heap size is increasing and is around 80 - 90%.
NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.
Here is the link for increaseing values based on file size
I hope this will help you.
Created 07-21-2016 06:04 AM
Try the command
hadoop fs -du -h /
This should get us the space occupied by directories in hdfs. To drill down, change / to the directory you want to check.
Created 07-21-2016 06:10 AM
1) Identifying directories usage in HDFS
Try to run below for each directory to know the usage
[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>
50.9 M /abc
[root@adc01 ~]#
2) namenode heap size is increasing and is around 80 - 90%.
NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.
Here is the link for increaseing values based on file size
I hope this will help you.