Member since
03-17-2017
2
Posts
0
Kudos Received
0
Solutions
03-20-2017
08:08 AM
The dfsadmin -report command only returns data on cluster and node level, which is not sufficient for me. As an example: on our cluster we host several projects, each with a different replication factor. And we want to know what these projects actually really consume on disk. Since these projects are located in different sub-directories we can retrieve the usage by hdfs hadoop fs -du / . But the data is not correct, since it does not take the replication factor.
... View more
03-17-2017
11:41 AM
I am trying to create a report containing HDFS space usage per directory The command I am using is hdfs hadoop fs -du / While this command gives an overview, it does not take the replication factor into account. Requesting more info via the quota command hdfs hadoop fs -count -q / shows raw space usage when quota are set, but certainly not for all of our directories. So to calculate a correct space usage these commands are pretty useless. Does anyone have a good approach to calculate the space usage correctly ? ,,
I am trying to set up a space usage report. At this moment I am using this command, to show actual space used. hdfs hadoop fs -du / However this command does not take into account the replication factor of the data and therefore is not showing the correct space usage. I tried to combine, as a circumvention, to use this command: hdfs hadoop fs -count -q / While this shows raw space info for some directories, it does certainly not for all directories (when quota is not set). So to get a correct overview of which directories are consuming space, all these commands seem pretty useless. Does anyone have experience with this ?
... View more
Labels:
- Labels:
-
Apache Hadoop