Created 07-26-2016 05:24 PM
Hi,
We have two secured clusters with namenode HA setup. Let's name them as PRIMARY and DR. We are now implementing a DR solution between the clusters using HDFS snapshots and distcp (We are on HDP2.4.2 and Falcon doesn't support HDFS snapshots till HDP2.5. So had to use HDFS snapshots with distcp) to replicate the data from PRIMARY to DR cluster. All the Hadoop daemon accounts on the clusters are appended with the cluster name. For example, PRIMARY-hdfs, DR-yarn etc. I have few questions in this regard:
Apologies if some of these are trivial. Hadoop security is still a grey-area for me and hence majority of these surround security.
Thanks
Vijay
Created 07-26-2016 09:08 PM
Please see my answers inline below:
Q: On which node should the distcp job be running?
-> Running the job on destination is fine. Just remember that distcp builds a "copylist" for files to copy. For large cluster with thousands of directories and subdirectories this can be an expensive operation specially when run from remote cluster. It's totally okay. you just need to be aware of it.
First, don't use hdfs. Now, the Kerberos principal you want to use must need to have read permissions on the files you will copy. If that's everything, then give appropriate permissions. If you are going to use two different principals then you need to provide the destination principal to be a proxy user aka impersonation on your source cluster. Add the following to your source cluster core-site.xml and restart source cluster. Use the new core-site.xml to connect to source cluster.
property> <name>hadoop.proxyuser.hdfsdestuser.hosts</name> <value><destination host or wherever this user is connecting from></value> </property> <property> <name>hadoop.proxyuser.hdfsdestuser.groups</name> <value><all the groups which users belong to. this user can impersonate></value> <!--might want to start with * and then restrict> </property>
This should enable your destination cluster to read source data. Also remember that if these users are in different kerberos realm then you need to setup cross realm trust. Check this link.
-> Check previous answer. don't use hdfs user. auth to local may or may not be required. Depends on what access you give the destination user.
-> Check above again. If it's same user, then it will make things easy. for users that are different, changing core-site.xml to add proxy user isn't very complicated either.
-> Check my answer to your question number 2.
-> Check this link. (Already referred earlier)
Created 07-26-2016 09:08 PM
Please see my answers inline below:
Q: On which node should the distcp job be running?
-> Running the job on destination is fine. Just remember that distcp builds a "copylist" for files to copy. For large cluster with thousands of directories and subdirectories this can be an expensive operation specially when run from remote cluster. It's totally okay. you just need to be aware of it.
First, don't use hdfs. Now, the Kerberos principal you want to use must need to have read permissions on the files you will copy. If that's everything, then give appropriate permissions. If you are going to use two different principals then you need to provide the destination principal to be a proxy user aka impersonation on your source cluster. Add the following to your source cluster core-site.xml and restart source cluster. Use the new core-site.xml to connect to source cluster.
property> <name>hadoop.proxyuser.hdfsdestuser.hosts</name> <value><destination host or wherever this user is connecting from></value> </property> <property> <name>hadoop.proxyuser.hdfsdestuser.groups</name> <value><all the groups which users belong to. this user can impersonate></value> <!--might want to start with * and then restrict> </property>
This should enable your destination cluster to read source data. Also remember that if these users are in different kerberos realm then you need to setup cross realm trust. Check this link.
-> Check previous answer. don't use hdfs user. auth to local may or may not be required. Depends on what access you give the destination user.
-> Check above again. If it's same user, then it will make things easy. for users that are different, changing core-site.xml to add proxy user isn't very complicated either.
-> Check my answer to your question number 2.
-> Check this link. (Already referred earlier)
Created 07-28-2016 01:36 PM