Support Questions

Find answers, ask questions, and share your expertise

how to clear HDFS directories on specific host

avatar

we have ambari cluster

the follwing commands will clear entire cluster..

while I want to clear the HDFS directories only on specific host and not - on entire cluster !!

$ hadoop namenode -format

$ hdfs namenode -format

so what are the commands to clear the HDFS only on specific host?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Super Collaborator

You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.

You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.

View solution in original post

8 REPLIES 8

avatar
Contributor

@Michael Bronson

I assumed when you mentioned rm -rf, you mean to delete datanode data directories.

When you use normal delete to delete the datanode directories, the block data for files will be deleted, and the replication factor for those blocks will be reduced by 1. And they remain as under replicated blocks if replication factor has been set to greater than 1.

avatar

just to be clear because this is very important - do you sure that - hdfs dfs -rmr /DirectoryPath will affected only on the host and not on the entire cluster?

Michael-Bronson

avatar
Contributor

@Michael Bronson

hdfs rm -r will delete the path you have provided recursively. The specified location will be deleted from hdfs cluster. So, that means it is deleted from entire hdfs cluster.

If trash option is enabled, it will move the deleted files to trash directory.

For more info, you can see the rm command usage

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#rm

The above link is for Hadoop 2.7.3 version.

avatar
Contributor

If you use hdfs dfs -rm -r it will delete the files from hdfs cluster. It affects HDFS cluster, not a particular host.

avatar
Super Collaborator

You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.

You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.

avatar
Expert Contributor

@Michael Bronson To delete the HDFS directories in cluster use the command mentioned below:

hdfs dfs -rmr /DirectoryPath

This will delete all directory and files under this path /DirectoryPath

avatar

so what is the diff if I just delete the folder by rm -rf ?

Michael-Bronson

avatar
Expert Contributor
@Michael Bronson

m -rf -> This is a Linux/Unix based command which will only delete your Unix/Lrinux based directory created in Unix/Linux file system.

Whereas

hdfs dfs -rmr /DirectoryPath -> Is for deletion of files/dirs in HDFS filesystem.

Incase I miss interpreted your question then and you mean to ask me what is difference between "hdfs dfs -rmr" and "hdfs dfs -rm -rf" then the later one doesn't exist as there is no "-f" parameter to rm command in HDFS filesystem.

We only have "-r" as an option for rm command in HDFS to delete the dir and files.