Support Questions

Find answers, ask questions, and share your expertise

Impersonation for distcp

avatar
Rising Star

Hi,

I am working on a distcp solution between two clusters. On cluster01 HDFS, there are multiple directories and each is owned by a different application team. The requirement is to distcp these directories onto cluster02 by preserving the access privileges. Both the clusters are secured.

I was thinking of having a service user something like "distcp-user" with its own kerberos principal who can manage the distcp process and auditing would be easy as well.

  • Would it be possible for distcp-user to complete the distcp process without having read access on cluster01 and write access on cluster02? Is this something impersonation can help with? For example, if dir1 on cluster01 is owned appuser1 and dir2 owned by appuser2, can distup-user impersonate both appuser1 and appuser2 and perform the distcp jobs on their behalf without sniffing into the actual underlying data?
  • Is it only possible if distcp-user has appropriate read access enabled on the cluster01 and write access on cluster02, something to be managed by Ranger / HDFS ACLs?

Thanks

Vijay

1 ACCEPTED SOLUTION

avatar
Master Guru

@Vijaya Narayana Reddy Bhoomi Reddy

You can leverage kerberos impersonations and maintain your read/write policy for the user you plan on impersonating through ranger. Setup user on ranger to read from cluster one. and cluster2 have ranger policy to able user to write. Have you looked into apache falcon? might be easier to setup the replication

confirm hadoop.security.authorization is set to true

To enable kerberos impersonations, core-site.xml

<property>
    <name>hadoop.proxyuser.yourapp.groups</name>
    <value>ImpersonationGrp1,ImpersonationGrp2</value>
</property>
<property>
    <name>hadoop.proxyuser.yourapp.hosts</name>
    <value>host</value>
</property>

Update yourapp with your service princple name. UPdate ImpersonationGrp1 and ImpersonationGrp2 with groups your user is allowed to impersonate. Finally update host with your app server

View solution in original post

2 REPLIES 2

avatar
Master Guru

@Vijaya Narayana Reddy Bhoomi Reddy

You can leverage kerberos impersonations and maintain your read/write policy for the user you plan on impersonating through ranger. Setup user on ranger to read from cluster one. and cluster2 have ranger policy to able user to write. Have you looked into apache falcon? might be easier to setup the replication

confirm hadoop.security.authorization is set to true

To enable kerberos impersonations, core-site.xml

<property>
    <name>hadoop.proxyuser.yourapp.groups</name>
    <value>ImpersonationGrp1,ImpersonationGrp2</value>
</property>
<property>
    <name>hadoop.proxyuser.yourapp.hosts</name>
    <value>host</value>
</property>

Update yourapp with your service princple name. UPdate ImpersonationGrp1 and ImpersonationGrp2 with groups your user is allowed to impersonate. Finally update host with your app server

avatar
Rising Star

Thanks @Sunile Manjee This is the approach I think I need to follow. Just was trying to understand if there is any other alternative. To answer your question around Falcon, we are not using because we are on HDP2.4.2 and need to leverage HDFS snapshots. Falcon doesn't yet support snapshots till 2.5. So going with this approach for now.