Support Questions

Find answers, ask questions, and share your expertise

HDFS Data Recovery

avatar
Expert Contributor

What are the best recovery options if a product like Abinitio runs an m_rm command that deletes the HDFS data in one of the environments.

These type of low level executions by-pass the Hadoop dfs rm command that puts the deleted data in the trash folder for recover.

The Trash Interval is Configured for 21 Days in the Hortonworks Environment.

Data had to be recreated from the source files, but if this were prod what are the best recovery options?

1 ACCEPTED SOLUTION

avatar
Master Guru
@Kirk Haslbeck

We can use protected directory feature for production to avoid accidental deletion of data from HDFS

https://issues.apache.org/jira/browse/HDFS-8983

View solution in original post

3 REPLIES 3

avatar
Master Guru
@Kirk Haslbeck

We can use protected directory feature for production to avoid accidental deletion of data from HDFS

https://issues.apache.org/jira/browse/HDFS-8983

avatar
Expert Contributor

What about Ranger, can that provide protection at this level? Assuming data does get removed any recovery options?

avatar
Super Guru

@Kirk Haslbeck

Ranger will allow you to authorize centrally where you can allow/disallow access to user level for creating or deleting directories. but it does not provide recovery options.

Pls do check - http://hortonworks.com/apache/ranger/