I have 2 kerberized clusters, both connected to the same AD, one of them with HDP 2.4 and the other with HDP 2.5. Now I would like to move all the data from one cluster to another.
I have been reading a lot about it, like the following links:
What I am doing is the following:
As the hdfs user in cluster 1, I can list all the files, but I can copy only the files for which I have explicit permissions for the hdfs user. For example:
A file with permissions 770 for user user1 and the group hdfs can be copied.
But a file with permissions 700 for user user1 and the group hdfs or another group, cannot be copied.
Also, the second cluster is configured in HA, but I cannot used the name defined in HA, I have to point directly to the active master namenode (which can be different each time)
With the following command:
hadoop distcp hdfs://master01/projects/folder hdfs://manager01/projects/.
If I don't have permissions for hdfs, I obtain the following error:
17/02/01 12:04:56 INFO mapreduce.Job: Task Id : attempt_1485252670123_0029_m_000003_1, Status : FAILED Error: java.io.IOException: File copy failed: hdfs://cluster01/projects/folder --> hdfs://cluster02/projects/folder org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:285) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:253) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
What should I do to copy all the files? Change first all the permissions to 777?
Thanks in advance
I think you have to readjust your krb5.conf the trick lies in the CAPATHS check that on the 2 clusters you have identical configuration.
Please go through the attached document it should help you .