Created 05-25-2016 01:43 PM
I have some questions around HDFS snapshots which can be used for backup and DR purposes.
Apologies if some of the questions doesn’t make sense. I am still trying to understand these concepts at a ground level.
Created 05-25-2016 04:05 PM
Answers inline.
Apologies if some of the questions doesn’t make sense. I am still trying to understand these concepts at a ground level.
Created 05-25-2016 02:28 PM
Wow, a TON of questions around Snapshots; I'll try to hit on most of them. Sounds like you might have already found these older posts on this topic, http://hortonworks.com/blog/snapshots-for-hdfs/ & http://hortonworks.com/blog/protecting-your-enterprise-data-with-hdfs-snapshots/.
For DR (data onto another cluster) you'll need to export these snapshots with a tool like distcp. As you go up into the Hive and HBase stacks, you have some other tools and options in addition to this. My recommendation is to open a dedicated HCC question for each after you do a little research and we can all jump in to help anything you don't understand.
As with all things, the best way to find out is to give it a try. As the next bit shows, you cannot delete a snapshot like "normal"; you have to use the special delete snapshot command.
[root@sandbox ~]# hdfs dfs -mkdir testsnaps [root@sandbox ~]# hdfs dfs -put /etc/group testsnaps/ [root@sandbox ~]# hdfs dfs -ls testsnaps Found 1 items -rw-r--r-- 3 root hdfs 1196 2016-05-25 14:18 testsnaps/group [root@sandbox ~]# su - hdfs [hdfs@sandbox ~]$ hdfs dfsadmin -allowSnapshot /user/root/test snapsAllowing snaphot on /user/root/testsnaps succeeded [hdfs@sandbox ~]$ exit logout [root@sandbox ~]# hdfs dfs -createSnapshot /user/root/testsnaps snap1 Created snapshot /user/root/testsnaps/.snapshot/snap1 [root@sandbox ~]# hdfs dfs -ls testsnaps/.snapshot/snap1 Found 1 items -rw-r--r-- 3 root hdfs 1196 2016-05-25 14:18 testsnaps/.snapshot/snap1/group [root@sandbox ~]# hdfs dfs -rmr -skipTrash /user/root/testsnaps/.snapshot/snap1 rmr: DEPRECATED: Please use 'rm -r' instead. rmr: Modification on a read-only snapshot is disallowed [root@sandbox ~]# hdfs dfs -deleteSnapshot /user/root/testsnaps snap1 [root@sandbox ~]# hdfs dfs -ls testsnaps/.snapshot [root@sandbox ~]#
There is no auto-delete of snapshots. The rule of thumb is that if you create them (likely with an automated process) then you need to have a complimentary process to delete them as you can clog up HDFS space if the data directory you are snapshotting actually does change.
Snapshots should not adversely affect your quotas, with the exception I just called out about them hanging onto HDFS space for items you have deleted from the actual directory that you do have 1+ snapshot pointing to.
Have fun playing around with snapshots & good luck!
Created 05-25-2016 04:05 PM
Answers inline.
Apologies if some of the questions doesn’t make sense. I am still trying to understand these concepts at a ground level.
Created 05-28-2016 12:22 AM
If you're using the current distcp for DR (i.e., using distcp copying data from one cluster to your backup cluster), you have an option to utilize snapshot to do incremental backup so as to improve the distcp performance/efficiency. More specifically, you can choose to take snapshots in both the source and the backup cluster and use -diff option for the distcp command. Then instead of blindly copying all the data, the distcp will first compute the difference between the given snapshots, and only copy the difference to the backup cluster.
Yes, if you have not skipped the trash, the file will be moved to the trash, and in the meanwhile, you can still access the file using the corresponding snapshot path.
No, if the file belongs to the snapshot (i.e., the file was created before a snapshot was taken), you will not release quota by deleting it. You may have to delete some old snapshots or increase your quota limit. Also in some old hadoop versions you may find the snapshots also affect the namespace quota usage in a strange way, i.e., sometimes deleting a file can increase the quota usage. This has been fixed by the latest version of HDP.