- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
files/directories occupying maximum space in hdfs
- Labels:
-
Apache Ambari
-
Apache Hadoop
Created ‎07-21-2016 05:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to find the files/directories that are occupying the maximum space in HDFS. so that i can ask the owners of those directories to clean up the spaces. what is the command that can provide the results for this purpose. Also i do notice the namenode heap size is increasing and is around 80 - 90%. IS there a way to find which files cause this . i.e which directory has too many small files or how to identify the small files which might cause this .
Or is there any other reason thaat is causing the namenode heapsize to increase.
My name node memory is 3 GB.
Created ‎07-21-2016 06:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Identifying directories usage in HDFS
Try to run below for each directory to know the usage
[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>
50.9 M /abc
[root@adc01 ~]#
2) namenode heap size is increasing and is around 80 - 90%.
NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.
Here is the link for increaseing values based on file size
I hope this will help you.
Created ‎07-21-2016 06:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try the command
hadoop fs -du -h /
This should get us the space occupied by directories in hdfs. To drill down, change / to the directory you want to check.
Created ‎07-21-2016 06:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Identifying directories usage in HDFS
Try to run below for each directory to know the usage
[root@abc01 ~]# hadoop fs -du -s -h <hdfs location>
50.9 M /abc
[root@adc01 ~]#
2) namenode heap size is increasing and is around 80 - 90%.
NN heap size increasing because of your cluster is occupying maximum space in HDFS for files & directories.
Here is the link for increaseing values based on file size
I hope this will help you.
