<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to protect HDFS directories from deletion by mistake in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118877#M81660</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt;: I am using hdp 2.3.4 and Hadoop 2.7.1.2.3.4.0-3485. I can see it is not properly supported in our hdp stack  ?&lt;/P&gt;</description>
    <pubDate>Thu, 28 Apr 2016 02:08:30 GMT</pubDate>
    <dc:creator>SK1</dc:creator>
    <dc:date>2016-04-28T02:08:30Z</dc:date>
    <item>
      <title>How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118875#M81658</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I was reading a KB article which can help us to protect our HDFS dir, but when I tested it then I am able to delete a protected dir. &lt;/P&gt;&lt;P&gt;Actually I have configured fs.protected.directories in core-site.xml with /lowes/sampleTest dir and tested below. &lt;/P&gt;&lt;P&gt;[root@samplehost ~]$ hadoop fs -rm -R -skipTrash /lowes/sampleTest&lt;/P&gt;&lt;P&gt;rm: Cannot delete non-empty protected directory /lowes/sampleTest&lt;/P&gt;&lt;P&gt;[root@samplehost ~]$ hadoop fs -rm -R  /lowes/sampleTest&lt;/P&gt;&lt;P&gt;16/04/27 05:50:15 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.&lt;/P&gt;&lt;P&gt;Moved: 'hdfs://HDPINFHA/lowes/sampleTest' to trash at: hdfs://HDPINFHA/user/root/.Trash/Current&lt;/P&gt;&lt;P&gt;So do you have any help on that. &lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:15:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118875#M81658</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2022-09-16T10:15:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118876#M81659</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt; - Which version of HDP are you using? I see that protected directory feature is there in hadoop 2.8.0&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/HDFS-8983" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-8983&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 23:03:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118876#M81659</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-27T23:03:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118877#M81660</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt;: I am using hdp 2.3.4 and Hadoop 2.7.1.2.3.4.0-3485. I can see it is not properly supported in our hdp stack  ?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 02:08:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118877#M81660</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-04-28T02:08:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118878#M81661</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt; - I just checked and 2.3.4 has &lt;A href="https://issues.apache.org/jira/browse/HDFS-8983"&gt;HDFS-8983&lt;/A&gt; implemented in it. I will try to re-produce and keep you posted.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 02:27:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118878#M81661</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-28T02:27:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118879#M81662</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2273/saurabhmcakiet.html" nodeid="2273"&gt;@Saurabh Kumar&lt;/A&gt;&lt;P&gt;Can you please try to delete hdfs://HDPINFHA/user/root/.Trash/Current//lowes/sampleTest ?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 02:30:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118879#M81662</guid>
      <dc:creator>KuldeepK</dc:creator>
      <dc:date>2016-04-28T02:30:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118880#M81663</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/504/kkulkarni.html" nodeid="504"&gt;@Kuldeep Kulkarni&lt;/A&gt;: I am able to delete trash as well.&lt;/P&gt;&lt;P&gt;[root@samplehost ~]$ hadoop fs -rmr hdfs://HDPINFHA/user/root/.Trash/Current/lowes/sampleTest&lt;/P&gt;&lt;P&gt;rmr: DEPRECATED: Please use 'rm -r' instead.&lt;/P&gt;&lt;P&gt;16/04/29 03:07:06 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.&lt;/P&gt;&lt;P&gt;Deleted hdfs://HDPINFHA/user/root/.Trash/Current/lowes/sampleTest&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 14:05:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118880#M81663</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-04-29T14:05:55Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118881#M81664</link>
      <description>&lt;P&gt;You can also use HDFS snapshot for protecting data from user errors : &lt;A href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html" target="_blank"&gt;https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 14:42:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118881#M81664</guid>
      <dc:creator>ahadjidj</dc:creator>
      <dc:date>2016-04-29T14:42:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118882#M81665</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2056/ahadjidj.html" nodeid="2056"&gt;@Abdelkrim Hadjidj&lt;/A&gt;: Yes you are right. Right now we are using snapshot only in all clusters. But as I saw this functionality so I was curious about it and thats why I have posted my concern. &lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 17:33:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118882#M81665</guid>
      <dc:creator>SK1</dc:creator>
      <dc:date>2016-04-29T17:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to protect HDFS directories from deletion by mistake</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118883#M81666</link>
      <description>&lt;P&gt;This is a good article by our intern James Medel to protect against accidental deletion:&lt;/P&gt;&lt;H1&gt;USING HDFS SNAPSHOTS TO PROTECT IMPORTANT ENTERPRISE DATASETS&lt;/H1&gt;&lt;P&gt;Sometime back, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors.&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_user-guide/content/user-guide-hdfs-snapshots.html"&gt;HDFS Snapshots&lt;/A&gt; are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree&lt;/LI&gt;&lt;LI&gt;Scalable: Snapshots do not create extra copies of blocks on the file system. Snapshots are highly optimized in memory and stored along with the NameNode’s file system namespace&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;In this blog post we’ll walk through how to administer and use HDFS snapshots.&lt;/P&gt;&lt;H2&gt;ENABLE SNAPSHOTS&lt;/H2&gt;&lt;P&gt;In an example scenario, Web Server logs are being loaded into HDFS on a daily basis for processing and long term storage. The logs are loaded in a few times a day, and the dataset is organized into directories that hold log files per day in HDFS. Since the Web Server logs are stored only in HDFS, it’s imperative that they are protected from deletion.&lt;/P&gt;&lt;P&gt;/data/weblogs&lt;/P&gt;&lt;P&gt;/data/weblogs/20130901&lt;/P&gt;&lt;P&gt;/data/weblogs/20130902&lt;/P&gt;&lt;P&gt;/data/weblogs/20130903&lt;/P&gt;&lt;P&gt;In order to provide data protection and recovery for the Web Server log data, snapshots are enabled for the parent directory:&lt;/P&gt;&lt;P&gt;hdfs dfsadmin -allowSnapshot /data/weblogs&lt;/P&gt;&lt;P&gt;Snapshots need to be explicitly enabled for directories. This provides system administrators with the level of granular control they need to manage data in HDP.&lt;/P&gt;&lt;H2&gt;TAKE POINT IN TIME SNAPSHOTS&lt;/H2&gt;&lt;P&gt;The following command creates a point in time snapshot of the /data/weblogs/directory and its subtree:&lt;/P&gt;&lt;P&gt;hdfs dfs -createSnapshot /data/weblogs&lt;/P&gt;&lt;P&gt;This will create a snapshot, and give it a default name which matches the timestamp at which the snapshot was created. Users can provide an optional snapshot name instead of the default. With the default name, the created snapshot path will be: /data/weblogs/.snapshot/s20130903-000941.091. Users can schedule a CRON job to create snapshots at regular intervals. Example, when you run CRON job: 30 18 * * * rm /home/someuser/tmp/*, the comand tells your file system to run the content from the tmp folder at 18:30 every day. For instance, to integrate CRON jobs with HDFS snapshots, run the command: 30 18 * * * hdfs dfs -createSnapshot /data/weblogs/* to schedule Snapshots to be created each day at 6:30.&lt;/P&gt;&lt;P&gt;To view the state of the directory at the recently created snapshot:&lt;/P&gt;&lt;P&gt;hdfs dfs -ls /data/weblogs/.snapshot/s20130903-000941.091&lt;/P&gt;&lt;P&gt;Found3 items&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0123:59/data/weblogs/.snapshot/s20130903-000941.091/20130901&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0200:55/data/weblogs/.snapshot/s20130903-000941.091/20130902&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0323:57/data/weblogs/.snapshot/s20130903-000941.091/20130903&lt;/P&gt;&lt;H2&gt;RECOVER LOST DATA&lt;/H2&gt;&lt;P&gt;As new data is loaded into the web logs dataset, there could be an erroneous deletion of a file or directory. For example, an application could delete the set of logs pertaining to Sept 2nd, 2013 stored in the /data/weblogs/20130902 directory.&lt;/P&gt;&lt;P&gt;Since /data/weblogs has a snapshot, the snapshot will protect from the file blocks being removed from the file system. A deletion will only modify the metadata to remove /data/weblogs/20130902 from the working directory.&lt;/P&gt;&lt;P&gt;To recover from this deletion, data is restored by copying the needed data from the snapshot path:&lt;/P&gt;&lt;P&gt;hdfs dfs -cp /data/weblogs/.snapshot/s20130903-000941.091/20130902/data/weblogs/&lt;/P&gt;&lt;P&gt;This will restore the lost set of files to the working data set:&lt;/P&gt;&lt;P&gt;hdfs dfs -ls /data/weblogs&lt;/P&gt;&lt;P&gt;Found3 items&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0123:59/data/weblogs/20130901&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0412:10/data/weblogs/20130902&lt;/P&gt;&lt;P&gt;drwxr-xr-x  - web hadoop  02013-09-0323:57/data/weblogs/20130903&lt;/P&gt;&lt;P&gt;Since snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:&lt;/P&gt;&lt;P&gt;hdfs dfs -rmdir /data/weblogs/.snapshot/s20130903-000941.091/20130902&lt;/P&gt;&lt;H2&gt;NEXT STEPS&lt;/H2&gt;&lt;P&gt;With &lt;A href="http://hortonworks.com/products/hdp/"&gt;HDP 2.1&lt;/A&gt;, you can use snapshots to protect your enterprise data from accidental deletion, corruption and errors. &lt;A href="http://hortonworks.com/products/hdp-2/#install"&gt;Download HDP to get started&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Apr 2017 23:10:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-protect-HDFS-directories-from-deletion-by-mistake/m-p/118883#M81666</guid>
      <dc:creator>sburagohain</dc:creator>
      <dc:date>2017-04-28T23:10:19Z</dc:date>
    </item>
  </channel>
</rss>

