Created 11-20-2017 08:19 PM
we have ambari cluster the follwing commands will clear entire cluster.. while I want to clear the HDFS directories only on specific host and not - on entire cluster !! $ hadoop namenode -format $ hdfs namenode -format so what are the commands to clear the HDFS only on specific host? |
Created 11-21-2017 09:48 PM
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.
You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
Created 11-21-2017 01:17 AM
@Michael Bronson
I assumed when you mentioned rm -rf, you mean to delete datanode data directories.
When you use normal delete to delete the datanode directories, the block data for files will be deleted, and the replication factor for those blocks will be reduced by 1. And they remain as under replicated blocks if replication factor has been set to greater than 1.
Created 11-21-2017 05:31 AM
just to be clear because this is very important - do you sure that - hdfs dfs -rmr /DirectoryPath will affected only on the host and not on the entire cluster?
Created 11-21-2017 07:56 PM
@Michael Bronson
hdfs rm -r will delete the path you have provided recursively. The specified location will be deleted from hdfs cluster. So, that means it is deleted from entire hdfs cluster.
If trash option is enabled, it will move the deleted files to trash directory.
For more info, you can see the rm command usage
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#rm
The above link is for Hadoop 2.7.3 version.
Created 11-21-2017 08:17 PM
If you use hdfs dfs -rm -r it will delete the files from hdfs cluster. It affects HDFS cluster, not a particular host.
Created 11-21-2017 09:48 PM
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.
You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
Created 11-20-2017 10:26 PM
@Michael Bronson To delete the HDFS directories in cluster use the command mentioned below:
hdfs dfs -rmr /DirectoryPath
This will delete all directory and files under this path /DirectoryPath
Created 11-20-2017 10:50 PM
so what is the diff if I just delete the folder by rm -rf ?
Created 11-21-2017 04:58 PM
m -rf -> This is a Linux/Unix based command which will only delete your Unix/Lrinux based directory created in Unix/Linux file system.
Whereas
hdfs dfs -rmr /DirectoryPath -> Is for deletion of files/dirs in HDFS filesystem.
Incase I miss interpreted your question then and you mean to ask me what is difference between "hdfs dfs -rmr" and "hdfs dfs -rm -rf" then the later one doesn't exist as there is no "-f" parameter to rm command in HDFS filesystem.
We only have "-r" as an option for rm command in HDFS to delete the dir and files.