Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS disk usage is 100%

avatar

our ambary cluster with HDFS 100%

I try to capture the trash size with - hadoop dfsadmin -report

but from the output I not see the trash size , so how to know the size of trash ?

and what are the CLI to get all used partition on HDFS include trash ?

second in order to solve the HDFS 100%

we think to add disks on the worker machine , is it usful in our case ?

if yes how to re balance the data on the new disks ?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Super Collaborator

if you want to see the usage within dfs, this should provide you with the disk usage:

hdfs dfs -du -h /

To see the size of the trash dir use this command:

hdfs dfs -du -h

To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.

Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).

View solution in original post

7 REPLIES 7

avatar
Super Collaborator

if you want to see the usage within dfs, this should provide you with the disk usage:

hdfs dfs -du -h /

To see the size of the trash dir use this command:

hdfs dfs -du -h

To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.

Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).

avatar

can you please show me how to decommission? before and after ?

Michael-Bronson

avatar
Super Collaborator

in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png

You should turn on maintenance mode before to avoid alerts.

avatar

from the output I see that

hdfs dfs -du -h /
398.2 M  /app-logs
7.5 M    /apps
3.5 M    /ats
695.6 M  /hdp
0        /mapred
0        /mr-history
53.1 G   /spark2-history
0        /tmp
516.3 M  /user

is it possible to delete the spark history from CLI , if not how to delete the spark history from ambari GUI?

Michael-Bronson

avatar
Super Collaborator

for the trash dir, try also to execute the command without the / at the end.

avatar
Super Collaborator

To delete /spark2-history:

hdfs dfs -rm -r /spark2-history/*

avatar

yes trash dir is small and isn't the problem

Michael-Bronson