Created on 
    
	
		
		
		03-22-2020
	
		
		06:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
 - last edited on 
    
	
		
		
		03-22-2020
	
		
		07:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
 by 
				
		
		
			ask_bill_brooks
		
		
		
		
		
		
		
		
	
			
		
INFO tools.DistCp: Input Options: DistCpOptions ooxx
INFO client.AHSProxy: Connecting to Application History server at ooxx
INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 256 for oo at xx
INFO security.TokenCache: Got dt for hdfs://clusterA:8020; Kind: HDFS_DELEGATION_TOKEN, Service: ...
INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
INFO tools.SimpleCopyListing: Build file listing
Created 03-25-2020 10:38 AM
Below are the steps to troubleshoot distcp:-
1. it is not problem with the hdfs or Kerberos or distcp but a MapReduce. 
2. We tried to run a sample MR job to test, then it failed with the following exception Error: Java.io.IOException: initialization of all the collectors failed. Error in last collector was:java.io.IOException: Invalid “mapreduce.task.io.sort.mb”:3276. (The total amount of buffer memory to use while sorting files, in MB). It was expecting less than 2048. Changing this property able to run the distcp smooth. 
I want to take a moment and say thanks to Shelton for responding it on time.
Created 03-23-2020 02:55 PM
unfortunately with such a vague and incomplete log, we can't help much.
Questions?
And any indo you deem important
Happy hadooping
Created 03-23-2020 03:26 PM
Yes, the cluster is kerberized, HWX, HDP 3.1.5. can't seem to find logs for the below operation. Below is the simple command:-
hadoop distcp /user/home/test.txt /tmp/
20/03/23 18:16:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/home/test.txt], targetPath=/tmp, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false}, sourcePaths=[/user/home/test.txt], targetPathExists=true, preserveRawXattrsfalse
20/03/23 18:16:59 INFO client.AHSProxy: Connecting to Application History server at host:10200
20/03/23 18:16:59 INFO hdfs.DFSClient: Created token for eid: HDFS_DELEGATION_TOKEN owner=EID@Domian.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161 on ha-hdfs:nn-ha
20/03/23 18:16:59 INFO kms.KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=Eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7))
20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nn-ha, Ident: (token for eid: HDFS_DELEGATION_TOKEN owner=eid@AOINS.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161)
20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7)
20/03/23 18:16:59 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
20/03/23 18:16:59 INFO tools.SimpleCopyListing: Build file listing completed.
When i do a ctrl+C to close the long running distcp job abruptly gives me below exception:-
ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 with inode: 56047223
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 (inode 56047223) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.
ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index with inode: 56047224
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index (inode 56047224) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.
ERROR tools.DistCp: Exception encountered
java.nio.channels.ClosedChannelException
Created 03-23-2020 04:36 PM
Disctcp is used for inter/Intracluster copy but the command you are running is not wrong because you need the source and destination NameNodes.
$ hadoop distcp /user/home/test.txt /tmp/The most common use of DistCp is an inter-cluster copy, where you copy from NameNode1[nn1] to Namenode2[nn2] on 2 different clusters and both clusters should be up and running during the process
$ hadoop distcp hdfs://nn1:8020/source hdfs://nn2:8020/destinationWhere hdfs://nn1:8020/source is the data source, and hdfs://nn2:8020/destination is the destination. This will expand the namespace under /source on NameNode "nn1" into a temporary file, partition its contents among a set of map tasks, and start copying from "nn1" to "nn2". Note that DistCp requires absolute paths.
Personally I think you should use CopyToLocal instead as according to my understanding you are trying to copy a file from hdfs to you local tmp directory
Assuming your directory /user/home/ is in hdfs and you are running the command as HDFS user! This will copy the test.txt from hdfs to local /tmp directory
$ hdfs dfs -copyToLocal /user/home/test.txt /tmp/And to successfully copy between 2 kerberized cluster you should perform the Kerberos cross-realm trust for distcp it's simple to setup just follow the guide and you will be fine
Please let me know if my assumption is correct
Created 03-23-2020 05:00 PM
I absolutely agree. Initially i was testing distcp with two trusted clusters with the below command. Since it was getting stuck, did a simple test to copy within the cluster but still the same issue. (All the prerequisites are met to do a distcp)
hadoop distcp hdfs://nn:8020/user/hdfs_home_eid/test.txt hdfs://nn:8020/tmp/
just as an FYI. Below is the link has similar issue:- but it dint help me to resolve the problem.
http://people.apache.org/~liuml07/2017/07/05/DistCp-gets-stuck-with-build-listing/
Created 03-23-2020 05:51 PM
The other thing which i noticed while testing the same in other cluster, distcp is getting stuck when trying to connect to application history server.
20/03/23 20:38:40 INFO client.AHSProxy: Connecting to Application History server at host/ipaddress:10200
Created 03-24-2020 03:35 AM
Can you share a scrambled version of your krb5.conf from both clusters and the auth-to_local of both clusters
When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the core-site.xml file:
<property>
    <name>hadoop.security.auth_to_local</name>
    <value></value>
    <description>Maps kerberos principals to local user names</description>
</property> 
Secure-to-Secure: Kerberos Principal Name
Assign the same principle name to applicable NameNodes in the source and destination clusters.
distcp hdfs://hdp-2.0-secure hdfs://hdp-2.0-secure
The SASL RPC client requires that the remote server’s Kerberos principal must match the server principal in its own configuration. Therefore, the same principal name must be assigned to the applicable NameNodes in the source and the destination cluster. 
For example, if the Kerberos principal name of the NameNode in the source cluster is nn/host1@realm, the Kerberos principal name of the NameNode in destination cluster must be nn/host2@realm, rather than nn2/host2@realm.
Secure-to-Secure: ResourceManager mapping rules
When copying between two HDP2 secure clusters, further ResourceManager (RM) configuration is required if the two clusters have different realms.
Can you share your hadoop.security.auth_to_local on both clusters, in order for DistCP to succeed, the same RM mapping rule must be used in both clusters. I am assuming the REALMS are TEST.COM and DEV.COM for cluster 1 and 2 respectively
<property>
    <name>hadoop.security.auth_to_local</name>
    <value>
    RULE:[2:$1@$0](rm@.*CLUSTER1.TEST.COM)s/.*/yarn/
     RULE:[2:$1@$0](rm@.*CLUSTER2.DEV.COM)s/.*/yarn/
     DEFAULT
    </value>
</property>
Can you try that and revert
Created 03-25-2020 10:38 AM
Below are the steps to troubleshoot distcp:-
1. it is not problem with the hdfs or Kerberos or distcp but a MapReduce. 
2. We tried to run a sample MR job to test, then it failed with the following exception Error: Java.io.IOException: initialization of all the collectors failed. Error in last collector was:java.io.IOException: Invalid “mapreduce.task.io.sort.mb”:3276. (The total amount of buffer memory to use while sorting files, in MB). It was expecting less than 2048. Changing this property able to run the distcp smooth. 
I want to take a moment and say thanks to Shelton for responding it on time.
 
					
				
				
			
		
