- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to clear HDFS directories on specific host
- Labels:
-
Apache Ambari
-
Apache Hadoop
Created 11-20-2017 08:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we have ambari cluster the follwing commands will clear entire cluster.. while I want to clear the HDFS directories only on specific host and not - on entire cluster !! $ hadoop namenode -format $ hdfs namenode -format so what are the commands to clear the HDFS only on specific host? |
Created 11-21-2017 09:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.
You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
Created 11-21-2017 01:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Michael Bronson
I assumed when you mentioned rm -rf, you mean to delete datanode data directories.
When you use normal delete to delete the datanode directories, the block data for files will be deleted, and the replication factor for those blocks will be reduced by 1. And they remain as under replicated blocks if replication factor has been set to greater than 1.
Created 11-21-2017 05:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
just to be clear because this is very important - do you sure that - hdfs dfs -rmr /DirectoryPath will affected only on the host and not on the entire cluster?
Created 11-21-2017 07:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Michael Bronson
hdfs rm -r will delete the path you have provided recursively. The specified location will be deleted from hdfs cluster. So, that means it is deleted from entire hdfs cluster.
If trash option is enabled, it will move the deleted files to trash directory.
For more info, you can see the rm command usage
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#rm
The above link is for Hadoop 2.7.3 version.
Created 11-21-2017 08:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you use hdfs dfs -rm -r it will delete the files from hdfs cluster. It affects HDFS cluster, not a particular host.
Created 11-21-2017 09:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can't clear HDFS on a host because HDFS is an filesystem abstraction over the entire cluster.
You can clear the datanode directories of a particular host (or format the disks), but the HDFS balancer will fill them back in depending on the other data ingestion processes of the cluster and ensuring 3 replicas on the files.
Created 11-20-2017 10:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Michael Bronson To delete the HDFS directories in cluster use the command mentioned below:
hdfs dfs -rmr /DirectoryPath
This will delete all directory and files under this path /DirectoryPath
Created 11-20-2017 10:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so what is the diff if I just delete the folder by rm -rf ?
Created 11-21-2017 04:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
m -rf -> This is a Linux/Unix based command which will only delete your Unix/Lrinux based directory created in Unix/Linux file system.
Whereas
hdfs dfs -rmr /DirectoryPath -> Is for deletion of files/dirs in HDFS filesystem.
Incase I miss interpreted your question then and you mean to ask me what is difference between "hdfs dfs -rmr" and "hdfs dfs -rm -rf" then the later one doesn't exist as there is no "-f" parameter to rm command in HDFS filesystem.
We only have "-r" as an option for rm command in HDFS to delete the dir and files.
