- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to do similar to rsync to copy files/directories onto HDFS, maintaining ownership and permission?
- Labels:
-
Apache Hadoop
Created 03-28-2018 09:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am copying files and directories owned by quite few users from Linux file systems onto HDFS. I usually run 'rsync' as root from one Linux machine to another Linux machine to be able to maintain files/directories' ownership and permission. Now, I am dealing with HDFS and HDFS admin account is not root. How do I do it so it is like rsync but for HDFS?
Created 03-28-2018 10:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should use scp as root from both the source and destination part of the cluster, this should be in a local directory eg /tmp
# cd /home # scp * root@destination:/tmp
Then as hdfs the hdfs super user you will have to create a home directory in HDFS for each user you copied earlier
Creating the home directory for user1 in hdfs
$ hdfs dfs -mkdir /user/user1 $ hdfs dfs -chown user1 /user/user1
Subsequently, If you want to create the subdirectories and change recursively the permission and owner
$ hdfs dfs -mkdir -p /user/user1/test/another/final $ hdfs dfs -chown -R user1 /user/user1/test/another/final
Then as the HDFS user go to the directory when you scp'ed earlier eg. /tmp
$ cd /tmp $ hdfs dfs -cp user1_objects /user/user1 or $ hdfs dfs -cp user1_objects /user/user1/test/another/final
Check the permission and ownership
$ hdfs dfs -ls /user/user1
You will need to do this for all the other users However its unfortunate you can't use DISTCP as the source isn't Hadoop.
Hope that helps