Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

files/directories occupying maximum space in hdfs

avatar

I need to find the files/directories that are occupying the maximum space in HDFS. so that i can ask the owners of those directories to clean up the spaces. what is the command that can provide the results for this purpose. Also i do notice the namenode heap size is increasing and is around 80 - 90%. IS there a way to find which files cause this . i.e which directory has too many small files or how to identify the small files which might cause this .

Or is there any other reason thaat is causing the namenode heapsize to increase.

My name node memory is 3 GB.

1 ACCEPTED SOLUTION

avatar

@ARUN

1) Identifying directories usage in HDFS

Try to run below for each directory to know the usage

[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>

50.9 M /abc

[root@adc01 ~]#

2) namenode heap size is increasing and is around 80 - 90%.

NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.

Here is the link for increaseing values based on file size

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-809...

I hope this will help you.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@ARUN

Try the command

hadoop fs -du -h /

This should get us the space occupied by directories in hdfs. To drill down, change / to the directory you want to check.

avatar

@ARUN

1) Identifying directories usage in HDFS

Try to run below for each directory to know the usage

[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>

50.9 M /abc

[root@adc01 ~]#

2) namenode heap size is increasing and is around 80 - 90%.

NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.

Here is the link for increaseing values based on file size

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-809...

I hope this will help you.