Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to clear HDFS directories on specific host

Solved Go to solution
Highlighted

how to clear HDFS directories on specific host

we have ambari cluster

the follwing commands will clear entire cluster..

while I want to clear the HDFS directories only on specific host and not - on entire cluster !!

$ hadoop namenode -format

$ hdfs namenode -format

so what are the commands to clear the HDFS only on specific host?

Michael-Bronson
1 ACCEPTED SOLUTION

Accepted Solutions

Re: how to clear HDFS directories on specific host

Super Collaborator

You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.

You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.

8 REPLIES 8

Re: how to clear HDFS directories on specific host

Contributor

@Michael Bronson To delete the HDFS directories in cluster use the command mentioned below:

hdfs dfs -rmr /DirectoryPath

This will delete all directory and files under this path /DirectoryPath

Re: how to clear HDFS directories on specific host

so what is the diff if I just delete the folder by rm -rf ?

Michael-Bronson

Re: how to clear HDFS directories on specific host

Contributor
@Michael Bronson

m -rf -> This is a Linux/Unix based command which will only delete your Unix/Lrinux based directory created in Unix/Linux file system.

Whereas

hdfs dfs -rmr /DirectoryPath -> Is for deletion of files/dirs in HDFS filesystem.

Incase I miss interpreted your question then and you mean to ask me what is difference between "hdfs dfs -rmr" and "hdfs dfs -rm -rf" then the later one doesn't exist as there is no "-f" parameter to rm command in HDFS filesystem.

We only have "-r" as an option for rm command in HDFS to delete the dir and files.

Re: how to clear HDFS directories on specific host

Cloudera Employee

@Michael Bronson

I assumed when you mentioned rm -rf, you mean to delete datanode data directories.

When you use normal delete to delete the datanode directories, the block data for files will be deleted, and the replication factor for those blocks will be reduced by 1. And they remain as under replicated blocks if replication factor has been set to greater than 1.

Re: how to clear HDFS directories on specific host

just to be clear because this is very important - do you sure that - hdfs dfs -rmr /DirectoryPath will affected only on the host and not on the entire cluster?

Michael-Bronson

Re: how to clear HDFS directories on specific host

Cloudera Employee

@Michael Bronson

hdfs rm -r will delete the path you have provided recursively. The specified location will be deleted from hdfs cluster. So, that means it is deleted from entire hdfs cluster.

If trash option is enabled, it will move the deleted files to trash directory.

For more info, you can see the rm command usage

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#rm

The above link is for Hadoop 2.7.3 version.

Re: how to clear HDFS directories on specific host

Cloudera Employee

If you use hdfs dfs -rm -r it will delete the files from hdfs cluster. It affects HDFS cluster, not a particular host.

Re: how to clear HDFS directories on specific host

Super Collaborator

You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.

You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.