Support Questions
Find answers, ask questions, and share your expertise

distcp between 2 kerberized clusters. Fails due to permissions


Hi all,

I have 2 kerberized clusters, both connected to the same AD, one of them with HDP 2.4 and the other with HDP 2.5. Now I would like to move all the data from one cluster to another.

I have been reading a lot about it, like the following links:

What I am doing is the following:

As the hdfs user in cluster 1, I can list all the files, but I can copy only the files for which I have explicit permissions for the hdfs user. For example:

A file with permissions 770 for user user1 and the group hdfs can be copied.

But a file with permissions 700 for user user1 and the group hdfs or another group, cannot be copied.

Also, the second cluster is configured in HA, but I cannot used the name defined in HA, I have to point directly to the active master namenode (which can be different each time)

With the following command:

hadoop distcp hdfs://master01/projects/folder hdfs://manager01/projects/.

If I don't have permissions for hdfs, I obtain the following error:

17/02/01 12:04:56 INFO mapreduce.Job: Task Id : attempt_1485252670123_0029_m_000003_1, Status : FAILED Error: File copy failed: hdfs://cluster01/projects/folder --> hdfs://cluster02/projects/folder at at at at org.apache.hadoop.mapred.MapTask.runNewMapper( at at org.apache.hadoop.mapred.YarnChild$ at Method) at at at org.apache.hadoop.mapred.YarnChild.main(

What should I do to copy all the files? Change first all the permissions to 777?

Thanks in advance


@Jose Molero

Can you provide output for below commands?

hadoop fs -ls /projects/folder --on Cluster 1

hadoop fs -ls /projects/folder --on Cluster 2

@Jose Molero

Can you check your core-site.xml property ""

Ref link:


@Jose Molero

I think you have to readjust your krb5.conf the trick lies in the CAPATHS check that on the 2 clusters you have identical configuration.

Please go through the attached document it should help you .