Support Questions

Find answers, ask questions, and share your expertise

How to create & recover a snapshot

avatar

Hi guys,
I have been using AWS machine for practice.My Question is :
How i Create & Recover Snapshot ?
Please explain.
Thanks,
Mudassar Hussain

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Mudassar Hussain

The steps 1,2 and 3 are okay.You dont need to create directory2, when you enable a directory as snapshottable in this case /user/mudassar/snapdemo the snapshots will be created under this directory with .snapshot....... which makes it invisible when you run the hdfs dfs -ls command

Let me demo on HDP 2.6

I will create your user on my local environment.

As mudassar as the root user

# adduser mudassar 

Switch to superuser and owner of HDFS

# su - hdfs 

Create the snapshot demo notice the -p option as the root directory /user/mudassar doesn't exist yet Note: hdfs hadoop will be deprecated so use hdfs dfs command!

$ hdfs dfs -mkdir -p /user/mudassar/snapdemo 

Validate directory

$ hdfs dfs -ls /user/mudassar 
Found 1 items 
drwxr-xr-x - hdfs hdfs 0 2018-03-15 09:39 /user/mudassar/snapdemo 

Change ownership to mudassar

$ hdfs dfs -chown mudassar /user/mudassar/snapdemo 

Validate change of ownership

$ hdfs dfs -ls /user/mudassar 
Found 1 items 
drwxr-xr-x - mudassar hdfs 0 2018-03-15 09:39 /user/mudassar/snapdemo 

Make the directory snapshottable

$ hdfs dfsadmin -allowSnapshot /user/mudassar/snapdemo 
Allowing snaphot on /user/mudassar/snapdemo succeeded

Show all the snapshottable directories in your cluster a subcommand under hdfs

$ hdfs lsSnapshottableDir 
drwxr-xr-x 0 mudassar hdfs 0 2018-03-15 09:39 0 65536 /user/mudassar/snapdemo 

Create 2 sample files in /tmp

$ echo "Test one for snaphot No worries No worries I was worried you got stuck and didn't revert the HCC is full of solutions so" > /tmp/text1.txt 
$ echo "The default behavior is that only a superuser is allowed to access all the resources of the Kafka cluster, and no other user can access those resources" > /tmp/text2.txt 

Validate the files were created

$cd /tmp 
$ls -lrt 
-rw-r--r-- 1 hdfs hadoop 121 Mar 15 10:04 text1.txt 
-rw-r--r-- 1 hdfs hadoop 152 Mar 15 10:04 text2.txt 

Copy the files from locall to HDFS

$ hdfs dfs -put text1.txt /user/mudassar/snapdemo 

Create a snapshot of the file text1.txt

$ hdfs dfs -createSnapshot /user/mudassar/snapdemo 
Created snapshot /user/mudassar/snapdemo/.snapshot/s20180315-101148.262 

Note above the .snapshot directory which is a hidden system directory

Show the snapshot of text1.txt

$ hdfs dfs -ls /user/mudassar/snapdemo/.snapshot/s20180315-101148.262 
Found 1 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/.snapshot/s20180315-101148.262/text1.txt 

Copied the second file text2.txt from local /tmp to HDFS

$ hdfs dfs -put text2.txt /user/mudassar/snapdemo 

Validation that the 2 files should be resent

$ hdfs dfs -ls /user/mudassar/snapdemo 
Found 2 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/text1.txt 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

Demo simulate loss of file text1.txt

$ hdfs dfs -rm /user/mudassar/snapdemo/text1.txt 

Indeed file text1.txt was deleted ONLY text2.txt remains

$ hdfs dfs -ls /user/mudassar/snapdemo 
Found 1 items 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

Restore the text1.txt

$ hdfs dfs -cp -ptopax /user/mudassar/snapdemo/.snapshot/s20180315-101148.262/text1.txt /user/mudassar/snapdemo 

To use -ptopax this ensure the timestamp is restored you will need to set the dfs.namenode.accesstime.precision to default 1 hr which is 360000 seconds

Check the original timestamp for text1.txt above !!!

hdfs dfs -ls /user/mudassar/snapdemo 
Found 2 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/text1.txt 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

In a nutshell, you don't need to create directory2 because when you run hdfs dfs -createSnapshot command it autocreates a directory under the original starting with .snapshot, that also saves you from extra steps of creating sort of a backup directory.


I hope that explains it clearly this time

View solution in original post

11 REPLIES 11

avatar
Master Mentor

In this examples I am using a Cloudera virtual box but it the same command on HDP. I didn't create a local user mudassar but used hdfs and the hdfs directory

#################################

# Whoami

#################################

# id
uid=0(root) gid=0(root) groups=0(root) 

#################################

# Switched to hdfs user

#################################

# su - hdfs 

#################################

# Changed directory to tmp to create test files

#################################

$ cd /tmp 

#################################

# Created first file

#################################

$ echo "This is the first text file for mudassar to test snapshot" > text1.txt 

#################################

# Created second file

#################################

$ echo "Apache Kafka comes with a pluggable authorizer known as Kafka Authorization Command Line (ACL) Interface, which is used for defining users and allowing or denying them to access its various APIs. The default behavior is that only a superuser is allowed to access all the resources of the Kafka cluster, and no other user can access those resources if no proper ACL is defined for those users. The general format in which Kafka ACL is defined is as follows" > text2.txt

#################

# Here is output

################

 $ ls -lrt 
-rw-rw-r-- 1 hdfs         hdfs             58 Mar  8 00:34 text1.txt 
-rw-rw-r-- 1 cloudera     hdfs            456 Mar  8 00:37 text2.txt 

#################################

# Above I changed the owner of text2.txt to cloudera as root user in the local filesystem

#################################

#chown cloudera:hdfs /tmp/text2.txt 

################################

# First create a target directory in hdfs

################################

hdfs dfs -mkdir /user/mudassar 

################################

# create a snapshottable directory

#################################

hdfs dfsadmin -allowSnapshot  /user/mudassar 

output:
Allowing snaphot on /user/mudassar succeeded

#################################

# Check the snapshottable dir in HDFS

#################################

hdfs lsSnapshottableDir 

Output

drwxr-xr-x 0 hdfs supergroup 0 2018-03-08 00:54 0 65536 /user/mudassar

#################################

# copy a file from local to the HDFS snapshotable directory

#################################

hdfs dfs -put /tmp/text1.txt  /user/mudassar 

#################################

# Validate the files was copied

#################################

hdfs dfs -ls /user/mudassar 

Found 1 items -rw-r--r-- 1 hdfs supergroup 58 2018-03-08 00:55 /user/mudassar/text1.txt #################################

# create snapshot

#################################

hdfs dfs -createSnapshot /user/mudassar 

Output Created snapshot /user/mudassar/.snapshot/s20180308-005619.181

#################################

# Check to see the snapshot

#################################

hdfs dfs -ls /user/mudassar/.snapshot 

Output Found 1 items drwxr-xr-x - hdfs supergroup 0 2018-03-08 00:56 /user/mudassar/.snapshot/s20180308-005619.181 #################################

# copy another file to the directory

#################################

hdfs dfs -put /tmp/text2.txt  /user/mudassar 

#################################

# Check the files exit in /user/mudassar notice the timestamp,permissions etc

#################################

hdfs dfs -ls /user/mudassar 

output

Found 2 items

-rw-r--r-- 1 hdfs supergroup 58 2018-03-08 00:55 /user/mudassar/text1.txt

-rw-r--r-- 1 hdfs supergroup 456 2018-03-08 00:58 /user/mudassar/text2.txt

#################################

# Changed ownership of one of the files

#################################

hdfs dfs -chown cloudera:supergroup /user/mudassar/text2.txt 

#################################

#Create second snapshot

#################################

hdfs dfs -createSnapshot /user/mudassar 

#################################

# checked the directory notice now we have 2 snapshots one contains ONLY texte1.txt and the other contains both files #################################

$ hdfs dfs -ls /user/mudassar/.snapshot 

Output

Found 2 items

drwxr-xr-x - hdfs supergroup 0 2018-03-08 00:56 /user/mudassar/.snapshot/s20180308-005619.181

drwxr-xr-x - hdfs supergroup 0 2018-03-08 01:01 /user/mudassar/.snapshot/s20180308-010152.924 #################################

# Simulate accidental deletion of the files

#################################

hdfs dfs -rm  /user/mudassar/* 

output

18/03/08 01:06:26 INFO fs.TrashPolicyDefault: Moved: 'hdfs://quickstart.cloudera:8020/user/mudassar/text1.txt' to trash at: hdfs://quickstart.cloudera:8020/user/hdfs/.Trash/Current/user/mudassar/text1.txt

18/03/08 01:06:26 INFO fs.TrashPolicyDefault: Moved: 'hdfs://quickstart.cloudera:8020/user/mudassar/text2.txt' to trash at: hdfs://quickstart.cloudera:8020/user/hdfs/.Trash/Current/user/mudassar/text2.txt

#################################

# Validate the files were deleted

#################################

hdfs dfs -ls /user/mudassar
Will return nothing  but the directory exist, run  hdfs dfs -ls /user/ 

#################################

# check the contents of the latest snapshot xxxxx.924 see above

#################################

hdfs dfs -ls -R /user/mudassar/.snapshot/s20180308-010152.924 

output

-rw-r--r-- 1 hdfs supergroup 58 2018-03-08 00:55 /user/mudassar/.snapshot/s20180308-010152.924/text1.txt

-rw-r--r-- 1 cloudera supergroup 456 2018-03-08 00:58 /user/mudassar/.snapshot/s20180308-010152.924/text2.txt

#################################

# Recover the 2 files from snapshot using -ptopax option

#################################

hdfs dfs -cp -ptopax /user/mudassar/.snapshot/s20180308-010152.924/* /user/mudassar/ 

#################################

# Validate the files were restored with original timestamps, ownership, permission, ACLs and XAttrs. #################################

hdfs dfs -ls /user/mudassar 

output

-rw-r--r-- 1 hdfs supergroup 58 2018-03-08 00:55 /user/mudassar/text1.txt -rw-r--r-- 1 cloudera supergroup 456 2018-03-08 00:58 /user/mudassar/text2.txt

There you are recovered the 2 files delete accidentally ,please let me know if that worked out for you

avatar
Master Mentor
@Mudassar Hussain

Any updates ?

avatar
Master Mentor

@Mudassar Hussain

You haven't given your feedback on the method and solution I provided, you should understand that members go a long way to help out and feedback as to whether the solution resolved your problem is appreciated and in that case you accept the answer to reward the user and close the thread,so that others with a similar problem could use it as a SOLUTION.

avatar

Hi @geoffrey shelton Okot I am really sorry. actually i was stuck in another task (deadline). I know if someone comes and help you it it great thing. and you always help us a lot. last time my "Acls" topic covers after you help.
I have not tried this solution yet. but in next coming days i will definitely try this solution and will let you know,
Many Thanks

avatar
Master Mentor

@Mudassar Hussain

No worries I was worried you got stuck and didn't revert the HCC is full of solutions so, don't hesitate to update the thread if you encounter any problems but if the solution provided resolves your issue then that's great what you need to do is accept and close the thread

Cheers

avatar

Thanks A lot @Geoffrey Shelton Okot
I have tried this solution and got the output but I am not satisfied from myself on this topic. So i want to do another example,
Please help me out. I am using Amazon Machine (Hortonworks) :
First of all I create user : "mudassar"
then change the ownership by wrote this command :
1. "sudo -u hdfs hadoop fs -chown mudassar:mudassar /user/mudassar"
then go to user "mudassar"
2. sudo su - mudassar
then i create a directory(snapdemo) in this user :
3. hadoop fs -mkdir /user/mudassar/snapdemo
then i create another directory(directory2) in this user :
4. hadoop fs -mkdir /user/mudassar/directory2
Now i have 2 directories in this user. Please see the below screen :

64618-snapshot.jpg

1. Now i want to create the text file (text1.txt) into "snapdemo" directory. and then take a snapshot of this directory into another directory (derectory2) of this user.
2. then delete the file (text1.txt) from directory "snapdemo" and then recover the snapshot from "directory2".
Hope above make sense to you.
Thanks Again
Cheers

avatar
Master Mentor

@Mudassar Hussain

The steps 1,2 and 3 are okay.You dont need to create directory2, when you enable a directory as snapshottable in this case /user/mudassar/snapdemo the snapshots will be created under this directory with .snapshot....... which makes it invisible when you run the hdfs dfs -ls command

Let me demo on HDP 2.6

I will create your user on my local environment.

As mudassar as the root user

# adduser mudassar 

Switch to superuser and owner of HDFS

# su - hdfs 

Create the snapshot demo notice the -p option as the root directory /user/mudassar doesn't exist yet Note: hdfs hadoop will be deprecated so use hdfs dfs command!

$ hdfs dfs -mkdir -p /user/mudassar/snapdemo 

Validate directory

$ hdfs dfs -ls /user/mudassar 
Found 1 items 
drwxr-xr-x - hdfs hdfs 0 2018-03-15 09:39 /user/mudassar/snapdemo 

Change ownership to mudassar

$ hdfs dfs -chown mudassar /user/mudassar/snapdemo 

Validate change of ownership

$ hdfs dfs -ls /user/mudassar 
Found 1 items 
drwxr-xr-x - mudassar hdfs 0 2018-03-15 09:39 /user/mudassar/snapdemo 

Make the directory snapshottable

$ hdfs dfsadmin -allowSnapshot /user/mudassar/snapdemo 
Allowing snaphot on /user/mudassar/snapdemo succeeded

Show all the snapshottable directories in your cluster a subcommand under hdfs

$ hdfs lsSnapshottableDir 
drwxr-xr-x 0 mudassar hdfs 0 2018-03-15 09:39 0 65536 /user/mudassar/snapdemo 

Create 2 sample files in /tmp

$ echo "Test one for snaphot No worries No worries I was worried you got stuck and didn't revert the HCC is full of solutions so" > /tmp/text1.txt 
$ echo "The default behavior is that only a superuser is allowed to access all the resources of the Kafka cluster, and no other user can access those resources" > /tmp/text2.txt 

Validate the files were created

$cd /tmp 
$ls -lrt 
-rw-r--r-- 1 hdfs hadoop 121 Mar 15 10:04 text1.txt 
-rw-r--r-- 1 hdfs hadoop 152 Mar 15 10:04 text2.txt 

Copy the files from locall to HDFS

$ hdfs dfs -put text1.txt /user/mudassar/snapdemo 

Create a snapshot of the file text1.txt

$ hdfs dfs -createSnapshot /user/mudassar/snapdemo 
Created snapshot /user/mudassar/snapdemo/.snapshot/s20180315-101148.262 

Note above the .snapshot directory which is a hidden system directory

Show the snapshot of text1.txt

$ hdfs dfs -ls /user/mudassar/snapdemo/.snapshot/s20180315-101148.262 
Found 1 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/.snapshot/s20180315-101148.262/text1.txt 

Copied the second file text2.txt from local /tmp to HDFS

$ hdfs dfs -put text2.txt /user/mudassar/snapdemo 

Validation that the 2 files should be resent

$ hdfs dfs -ls /user/mudassar/snapdemo 
Found 2 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/text1.txt 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

Demo simulate loss of file text1.txt

$ hdfs dfs -rm /user/mudassar/snapdemo/text1.txt 

Indeed file text1.txt was deleted ONLY text2.txt remains

$ hdfs dfs -ls /user/mudassar/snapdemo 
Found 1 items 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

Restore the text1.txt

$ hdfs dfs -cp -ptopax /user/mudassar/snapdemo/.snapshot/s20180315-101148.262/text1.txt /user/mudassar/snapdemo 

To use -ptopax this ensure the timestamp is restored you will need to set the dfs.namenode.accesstime.precision to default 1 hr which is 360000 seconds

Check the original timestamp for text1.txt above !!!

hdfs dfs -ls /user/mudassar/snapdemo 
Found 2 items 
-rw-r--r-- 3 hdfs hdfs 121 2018-03-15 10:10 /user/mudassar/snapdemo/text1.txt 
-rw-r--r-- 3 hdfs hdfs 152 2018-03-15 10:19 /user/mudassar/snapdemo/text2.txt 

In a nutshell, you don't need to create directory2 because when you run hdfs dfs -createSnapshot command it autocreates a directory under the original starting with .snapshot, that also saves you from extra steps of creating sort of a backup directory.


I hope that explains it clearly this time

avatar
New Contributor

Hi shelton,

please write me the command that recovers snapshot data in such a way by retaining its ownership,timestamp,permissions and acls.

Looking forward to hear from you.

thanks

avatar

Thanks a lot 🙂 @Geoffrey Shelton Okot
I got the desired output. I will do some more work on this topic.
I will let you know if i will stuck at any point.
one thing i wana let you know, below command not working at my machine :
"hdfs lsSnapshottableDir " Show all the snapshot-table directories
Cheers 🙂