- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Has anyone done distcp between secured clusters but different REALM?
- Labels:
-
Apache Hadoop
Created 11-25-2015 05:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As long as reading http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ref-263ee41f-a0a... , looks like implying same realm.
Even Cloudera mentions Distinct Realms...
Created 11-25-2015 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Search of Hortonworks documentation indicates the following three requirements, besides general Kerberos correct setup:
1. Both clusters must be using Java 1.7 or better if you are using MIT kerberos. Java 1.6 has too many known bugs with cross-realm trust; eg see ref http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7061379
2. The same principal name must be assigned to the NameNodes in both the source and the destination cluster. For example, if the Kerberos principal name of the NameNode in cluster A is nn/host1@realm, the Kerberos principal name of the NameNode in cluster B must be nn/host2@realm, not, for example, nn2/host2@realm; see ref http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ref-263ee41f-a0a...
3. Bi-directional cross-realm trust must be set up. Correct trust setup can be tested by running an hdfs client on a node from cluster A and see if you can put a file or list a directory on cluster B, and vice versa; credit Robert Molina in the old Hortonworks Forums, post-49303.
Note: the key statement for items #2 and #3 is that "It is important that each NodeManager can reach and communicate with both the source and destination file systems"; see ref https://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html. Therefore the trust must be bi-directional.
Created 06-21-2016 05:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have followed the same steps and when i do distcp between 2 secured HA cluster yarn throws failed to renew token error kind HDFS_DELGATION_TOKEN service: ha-hdfs . i am able to do hadoop fs -ls using the HA on both the cluster. Bothe the cluster has MIT KDC and cross realm setup is done. Bothe the cluster has the same namenode principal. Is there anything else that i need to do?
Just an info , when i change the framework from yarn to MR in mapred-client.xml, i am able to do distcp . when i use the yarn framework i get the above error.
Created 11-25-2015 06:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not done distcp with different Kerberos REALMS, but I think this should be possible. Our documentation only mentions "same principal name must be assigned to the applicable NameNodes", so that auth_to_local configuration can calculate the same username on both sides (Kerberos principal: nn/host1@realm will be user "nn"). As long as the different realms use the same KDC or the KDCs trust each other, this should be possible.
Created 11-25-2015 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Search of Hortonworks documentation indicates the following three requirements, besides general Kerberos correct setup:
1. Both clusters must be using Java 1.7 or better if you are using MIT kerberos. Java 1.6 has too many known bugs with cross-realm trust; eg see ref http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7061379
2. The same principal name must be assigned to the NameNodes in both the source and the destination cluster. For example, if the Kerberos principal name of the NameNode in cluster A is nn/host1@realm, the Kerberos principal name of the NameNode in cluster B must be nn/host2@realm, not, for example, nn2/host2@realm; see ref http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ref-263ee41f-a0a...
3. Bi-directional cross-realm trust must be set up. Correct trust setup can be tested by running an hdfs client on a node from cluster A and see if you can put a file or list a directory on cluster B, and vice versa; credit Robert Molina in the old Hortonworks Forums, post-49303.
Note: the key statement for items #2 and #3 is that "It is important that each NodeManager can reach and communicate with both the source and destination file systems"; see ref https://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html. Therefore the trust must be bi-directional.
Created 06-21-2016 05:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have followed the same steps and when i do distcp between 2 secured HA cluster yarn throws failed to renew token error kind HDFS_DELGATION_TOKEN service: ha-hdfs . i am able to do hadoop fs -ls using the HA on both the cluster. Bothe the cluster has MIT KDC and cross realm setup is done. Bothe the cluster has the same namenode principal. Is there anything else that i need to do?
Just an info , when i change the framework from yarn to MR in mapred-client.xml, i am able to do distcp . when i use the yarn framework i get the above error.
Created 06-23-2016 08:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The fact that distcp works with some configurations indicates you probably have Security set up right, as well as giving you an obvious work-around. To try to answer your question, please provide some clarifying information:
- When you speak of mapred-client.xml, do you mean mapred-site.xml on the client machine?
- When you speak of changing the framework, do you mean the "mapreduce.framework.name" configuration parameter in mapred-side.xml?
- Do you change it only on the client machine, or throughout both clusters?
- The allowed values of that parameter are "local", "classic", and "yarn". When you change it to not be "yarn", what do you set it to?
- Do you have "mapreduce.application.framework.path" set? If so, to what value?
