Support Questions

mike_bronson7 · ‎02-09-2018

our ambary cluster with HDFS 100%

I try to capture the trash size with - hadoop dfsadmin -report

but from the output I not see the trash size , so how to know the size of trash ?

and what are the CLI to get all used partition on HDFS include trash ?

second in order to solve the HDFS 100%

we think to add disks on the worker machine , is it usful in our case ?

if yes how to re balance the data on the new disks ?

Michael-Bronson

arald · ‎02-09-2018

if you want to see the usage within dfs, this should provide you with the disk usage:

hdfs dfs -du -h /

To see the size of the trash dir use this command:

hdfs dfs -du -h

To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.

Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).

View solution in original post

arald · ‎02-09-2018

if you want to see the usage within dfs, this should provide you with the disk usage:

hdfs dfs -du -h /

To see the size of the trash dir use this command:

hdfs dfs -du -h

To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.

Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).

mike_bronson7 · ‎02-09-2018

can you please show me how to decommission? before and after ?

Michael-Bronson

arald · ‎02-09-2018

in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png

You should turn on maintenance mode before to avoid alerts.

mike_bronson7 · ‎02-09-2018

from the output I see that

hdfs dfs -du -h /
398.2 M  /app-logs
7.5 M    /apps
3.5 M    /ats
695.6 M  /hdp
0        /mapred
0        /mr-history
53.1 G   /spark2-history
0        /tmp
516.3 M  /user

is it possible to delete the spark history from CLI , if not how to delete the spark history from ambari GUI?

Michael-Bronson

arald · ‎02-09-2018

for the trash dir, try also to execute the command without the / at the end.

arald · ‎02-09-2018

To delete /spark2-history:

hdfs dfs -rm -r /spark2-history/*

mike_bronson7 · ‎02-09-2018

yes trash dir is small and isn't the problem

Michael-Bronson

Cloudera Community

Support Questions

HDFS disk usage is 100%