<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Impersonation for distcp in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163325#M36822</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working on a distcp solution between two clusters. On  cluster01 HDFS, there are multiple directories and each is owned by a different application team. The requirement is to distcp these directories onto cluster02 by preserving the access privileges. Both the clusters are secured.&lt;/P&gt;&lt;P&gt;I was thinking of having a service user something like "distcp-user" with its own kerberos principal who can manage the distcp process and auditing would be easy as well.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Would it be possible for distcp-user to complete the distcp process without having read access on cluster01 and write access on cluster02? Is this something impersonation can help with? For example, if dir1 on cluster01 is owned appuser1 and dir2 owned by appuser2, can distup-user impersonate both appuser1 and appuser2 and perform the distcp jobs on their behalf without sniffing into the actual underlying data?&lt;/LI&gt;&lt;LI&gt;Is it only possible if distcp-user has appropriate read access enabled on the cluster01 and write access on cluster02, something to be managed by Ranger / HDFS ACLs?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Vijay&lt;/P&gt;</description>
    <pubDate>Wed, 03 Aug 2016 21:56:00 GMT</pubDate>
    <dc:creator>bhoomireddy_vij</dc:creator>
    <dc:date>2016-08-03T21:56:00Z</dc:date>
    <item>
      <title>Impersonation for distcp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163325#M36822</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am working on a distcp solution between two clusters. On  cluster01 HDFS, there are multiple directories and each is owned by a different application team. The requirement is to distcp these directories onto cluster02 by preserving the access privileges. Both the clusters are secured.&lt;/P&gt;&lt;P&gt;I was thinking of having a service user something like "distcp-user" with its own kerberos principal who can manage the distcp process and auditing would be easy as well.&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Would it be possible for distcp-user to complete the distcp process without having read access on cluster01 and write access on cluster02? Is this something impersonation can help with? For example, if dir1 on cluster01 is owned appuser1 and dir2 owned by appuser2, can distup-user impersonate both appuser1 and appuser2 and perform the distcp jobs on their behalf without sniffing into the actual underlying data?&lt;/LI&gt;&lt;LI&gt;Is it only possible if distcp-user has appropriate read access enabled on the cluster01 and write access on cluster02, something to be managed by Ranger / HDFS ACLs?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Vijay&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 21:56:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163325#M36822</guid>
      <dc:creator>bhoomireddy_vij</dc:creator>
      <dc:date>2016-08-03T21:56:00Z</dc:date>
    </item>
    <item>
      <title>Re: Impersonation for distcp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163326#M36823</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/2733/bhoomireddyvijay.html" nodeid="2733"&gt;@Vijaya Narayana Reddy Bhoomi Reddy&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can leverage kerberos impersonations and maintain your read/write policy for the user you plan on impersonating through ranger.  Setup user on ranger to read from cluster one.  and cluster2 have ranger policy to able user to write.  Have you looked into apache falcon?  might be easier to setup the replication&lt;/P&gt;&lt;P&gt;confirm hadoop.security.authorization is set to true&lt;/P&gt;&lt;P&gt;To enable kerberos impersonations, core-site.xml&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;hadoop.proxyuser.yourapp.groups&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;ImpersonationGrp1,ImpersonationGrp2&amp;lt;/value&amp;gt;
&amp;lt;/property&amp;gt;
&amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;hadoop.proxyuser.yourapp.hosts&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;host&amp;lt;/value&amp;gt;
&amp;lt;/property&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Update yourapp with your service princple name.  UPdate ImpersonationGrp1 and ImpersonationGrp2 with groups your user is allowed to impersonate.  Finally update host with your app server&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 23:24:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163326#M36823</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-08-03T23:24:04Z</dc:date>
    </item>
    <item>
      <title>Re: Impersonation for distcp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163327#M36824</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/1486/smanjee.html" nodeid="1486"&gt;@Sunile Manjee&lt;/A&gt; This is the approach I think I need to follow. Just was trying to understand if there is any other alternative. To answer your question around Falcon, we are not using because we are on HDP2.4.2 and need to leverage HDFS snapshots. Falcon doesn't yet support snapshots till 2.5. So going with this approach for now.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 23:35:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impersonation-for-distcp/m-p/163327#M36824</guid>
      <dc:creator>bhoomireddy_vij</dc:creator>
      <dc:date>2016-08-03T23:35:12Z</dc:date>
    </item>
  </channel>
</rss>

