Support Questions

Find answers, ask questions, and share your expertise

How to do similar to rsync to copy files/directories onto HDFS, maintaining ownership and permission?

avatar
Explorer

I am copying files and directories owned by quite few users from Linux file systems onto HDFS. I usually run 'rsync' as root from one Linux machine to another Linux machine to be able to maintain files/directories' ownership and permission. Now, I am dealing with HDFS and HDFS admin account is not root. How do I do it so it is like rsync but for HDFS?

1 REPLY 1

avatar
Master Mentor

@Jacky Hung

You should use scp as root from both the source and destination part of the cluster, this should be in a local directory eg /tmp

# cd /home 
# scp * root@destination:/tmp 

Then as hdfs the hdfs super user you will have to create a home directory in HDFS for each user you copied earlier

Creating the home directory for user1 in hdfs

$ hdfs dfs -mkdir /user/user1 
$ hdfs dfs -chown user1 /user/user1 

Subsequently, If you want to create the subdirectories and change recursively the permission and owner

$ hdfs dfs -mkdir -p /user/user1/test/another/final 
$ hdfs dfs -chown -R user1 /user/user1/test/another/final 

Then as the HDFS user go to the directory when you scp'ed earlier eg. /tmp

$ cd /tmp 
$ hdfs dfs -cp user1_objects /user/user1  or
$ hdfs dfs -cp user1_objects /user/user1/test/another/final 

Check the permission and ownership

$ hdfs dfs -ls /user/user1 

You will need to do this for all the other users However its unfortunate you can't use DISTCP as the source isn't Hadoop.

Hope that helps