Created 02-09-2018 11:51 AM
our ambary cluster with HDFS 100%
I try to capture the trash size with - hadoop dfsadmin -report
but from the output I not see the trash size , so how to know the size of trash ?
and what are the CLI to get all used partition on HDFS include trash ?
second in order to solve the HDFS 100%
we think to add disks on the worker machine , is it usful in our case ?
if yes how to re balance the data on the new disks ?
Created 02-09-2018 02:22 PM
if you want to see the usage within dfs, this should provide you with the disk usage:
hdfs dfs -du -h /
To see the size of the trash dir use this command:
hdfs dfs -du -h
To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.
Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).
Created 02-09-2018 02:22 PM
if you want to see the usage within dfs, this should provide you with the disk usage:
hdfs dfs -du -h /
To see the size of the trash dir use this command:
hdfs dfs -du -h
To add new disk (in the normal mode), you typically decommission the data node service on the worker node, add the disk and decommision again, but the HDFS will try to replicate the blocks from that node to the other nodes to avoid data loss. I'm not sure if an already full hdfs will cause errors here. Can you try to (temporary) add nodes? This will add hdfs capacity, with that the decommissioning of one node should be ok, providing you a way to increase the local disk capacity.
Not sure if the rebalancing needs to be triggered manually, I believe it will start automatically (causing during that time additional load on the nodes).
Created 02-09-2018 02:40 PM
can you please show me how to decommission? before and after ?
Created 02-09-2018 02:58 PM
in Ambari, go to the host details, and there you can click on the button right to the 'Datanode HDFS' service line .screenshot-decomission.png
You should turn on maintenance mode before to avoid alerts.
Created 02-09-2018 02:44 PM
from the output I see that
hdfs dfs -du -h / 398.2 M /app-logs 7.5 M /apps 3.5 M /ats 695.6 M /hdp 0 /mapred 0 /mr-history 53.1 G /spark2-history 0 /tmp 516.3 M /user is it possible to delete the spark history from CLI , if not how to delete the spark history from ambari GUI?
Created 02-09-2018 02:54 PM
for the trash dir, try also to execute the command without the / at the end.
Created 02-09-2018 03:01 PM
To delete /spark2-history:
hdfs dfs -rm -r /spark2-history/*
Created 02-09-2018 02:57 PM
yes trash dir is small and isn't the problem