I am performing extensive experiments over my 3-nodes (VMs) cluster. my VMs have a disk space of 50GB each, and checking the space available (on localhost:9870 (namenode's UI)) after 10 spark-submit application submissions reveal that the hard disks are almost plenty. How to delete that created data without restarting and reformatting the hdfs ?
I was thinking of a datanode clean up command to use here.
You can remove the data from HDFS using the following command
#hdfs dfs -rm -R -skipTrash <Extra-Data-folder>
#hdfs dfs -rm -r /tmp/spark
This issue is caused by having too many Datanodes with too high of disk utilization thus reducing the total number of Datanodes available for write requests.
As a result, Datanodes which are still available for writes will be targeted at a higher rate - increasing their transceiver activity to the point of being "overloaded".
Hopefully the provided solution will help resolve the issue.
Thanks, but i want to remove data resulting from executing Spark applications through the command spark-submit not from HDFS, could you confirm those are the commands to use in this case ?