I am using CDH 5.4. I have ~60 G disk free on each machine.
I ran a map job, and am showing unhealthy HDFS (disk full).
I restarted my cluster, I see no mapred jobs now.
I see that this folder is over 50G
By examining the contents of the files under this directory and subdirs, it seems that all the logging that I am doing with log4j is getting replicated here.
Using HUE, I have deleted the folders in HDFS which were my input and output folders, and also the logs under /tmp/logs/ubuntu/logs .
Can I just delete the current/finalized folder ? What is the correct way to clean up ?
restart your cluster and let sit for a couple of days. It will clear by itself.
The same "solution" was also proposed by someone on stackoverflow, if something like this happens to your cluster.