Support Questions

Find answers, ask questions, and share your expertise

Distcp got stuck with the below and doesn’t do anything.

avatar
New Contributor
INFO tools.DistCp: Input Options: DistCpOptions ooxx
INFO client.AHSProxy: Connecting to Application History server at ooxx
INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 256 for oo at xx
INFO security.TokenCache: Got dt for hdfs://clusterA:8020; Kind: HDFS_DELEGATION_TOKEN, Service: ...
INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
INFO tools.SimpleCopyListing: Build file listing
1 ACCEPTED SOLUTION

avatar
New Contributor

Below are the steps to troubleshoot distcp:- 

1. it is not problem with the hdfs or Kerberos or distcp but a MapReduce. 
2. We tried to run a sample MR job to test, then it failed with the following exception Error: Java.io.IOException: initialization of all the collectors failed. Error in last collector was:java.io.IOException: Invalid “mapreduce.task.io.sort.mb”:3276. (The total amount of buffer memory to use while sorting files, in MB). It was expecting less than 2048. Changing this property able to run the distcp smooth. 

I want to take a moment and say thanks to Shelton for responding it on time. 

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Arun66 

unfortunately with such a vague and incomplete log, we can't help much.

Questions?

  1. CDH or HWX
  2. Shar e the logs?
  3. Share the command being executed?
  4. Kerberized or not

And any indo you deem important

 

Happy hadooping

 

avatar
New Contributor

Yes, the cluster is kerberized, HWX, HDP 3.1.5. can't seem to find logs for the below operation. Below is the simple command:-

hadoop distcp /user/home/test.txt /tmp/

 

20/03/23 18:16:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/home/test.txt], targetPath=/tmp, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false}, sourcePaths=[/user/home/test.txt], targetPathExists=true, preserveRawXattrsfalse
20/03/23 18:16:59 INFO client.AHSProxy: Connecting to Application History server at host:10200
20/03/23 18:16:59 INFO hdfs.DFSClient: Created token for eid: HDFS_DELEGATION_TOKEN owner=EID@Domian.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161 on ha-hdfs:nn-ha
20/03/23 18:16:59 INFO kms.KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=Eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7))
20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nn-ha, Ident: (token for eid: HDFS_DELEGATION_TOKEN owner=eid@AOINS.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161)
20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7)
20/03/23 18:16:59 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
20/03/23 18:16:59 INFO tools.SimpleCopyListing: Build file listing completed.

 

When i do a ctrl+C to close the long running distcp job abruptly gives me below exception:-

 

ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 with inode: 56047223
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 (inode 56047223) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.

 

ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index with inode: 56047224
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index (inode 56047224) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.

ERROR tools.DistCp: Exception encountered
java.nio.channels.ClosedChannelException

avatar
Master Mentor

@kasa 

 

Disctcp is used for inter/Intracluster copy but the command you are running is not wrong because you need the source and destination NameNodes.

$ hadoop distcp /user/home/test.txt /tmp/

The most common use of DistCp is an inter-cluster copy, where you copy from NameNode1[nn1] to Namenode2[nn2] on 2 different clusters and both clusters should be up and running during the process

$ hadoop distcp hdfs://nn1:8020/source hdfs://nn2:8020/destination

Where hdfs://nn1:8020/source is the data source, and hdfs://nn2:8020/destination is the destination. This will expand the namespace under /source on NameNode "nn1" into a temporary file, partition its contents among a set of map tasks, and start copying from "nn1" to "nn2". Note that DistCp requires absolute paths.

 

Personally I think you should use CopyToLocal instead as according to my understanding you are trying to copy a file from hdfs to you local tmp directory

 

Assuming your directory /user/home/ is in hdfs and you are running the command as HDFS user! This will copy the test.txt from hdfs to local /tmp directory

 

$ hdfs dfs -copyToLocal /user/home/test.txt /tmp/

And to successfully copy between 2 kerberized cluster you should perform the Kerberos cross-realm trust for distcp it's simple to setup just follow the guide and you will be fine

 

Please let me know if my assumption is correct

 

avatar
New Contributor

I absolutely agree. Initially i was testing distcp with two trusted clusters with the below command. Since it was getting stuck, did a simple test to copy within the cluster but still the same issue. (All the prerequisites are met to do a distcp)

hadoop distcp hdfs://nn:8020/user/hdfs_home_eid/test.txt hdfs://nn:8020/tmp/

 

just as an FYI. Below is the link has similar issue:- but it dint help me to resolve the problem.

http://people.apache.org/~liuml07/2017/07/05/DistCp-gets-stuck-with-build-listing/

avatar
New Contributor

The other thing which i noticed while testing the same in other cluster, distcp is getting stuck when trying to connect to application history server.

20/03/23 20:38:40 INFO client.AHSProxy: Connecting to Application History server at host/ipaddress:10200

avatar
Master Mentor

@kasa 

Can you share a scrambled version of your krb5.conf from both clusters and the auth-to_local of both clusters

When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the core-site.xml file:

 

<property>
    <name>hadoop.security.auth_to_local</name>
    <value></value>
    <description>Maps kerberos principals to local user names</description>
</property> 

 

Secure-to-Secure: Kerberos Principal Name
Assign the same principle name to applicable NameNodes in the source and destination clusters.

 

distcp hdfs://hdp-2.0-secure hdfs://hdp-2.0-secure

 


The SASL RPC client requires that the remote server’s Kerberos principal must match the server principal in its own configuration. Therefore, the same principal name must be assigned to the applicable NameNodes in the source and the destination cluster.

For example, if the Kerberos principal name of the NameNode in the source cluster is nn/host1@realm, the Kerberos principal name of the NameNode in destination cluster must be nn/host2@realm, rather than nn2/host2@realm.

 

Secure-to-Secure: ResourceManager mapping rules

When copying between two HDP2 secure clusters, further ResourceManager (RM) configuration is required if the two clusters have different realms.

Can you share your hadoop.security.auth_to_local on both clusters, in order for DistCP to succeed, the same RM mapping rule must be used in both clusters. I am assuming the REALMS are TEST.COM and DEV.COM for cluster 1 and 2 respectively

 

<property>
    <name>hadoop.security.auth_to_local</name>
    <value>
    RULE:[2:$1@$0](rm@.*CLUSTER1.TEST.COM)s/.*/yarn/
     RULE:[2:$1@$0](rm@.*CLUSTER2.DEV.COM)s/.*/yarn/
     DEFAULT
    </value>
</property>

 

Can you try that and revert

 

 

avatar
New Contributor

Below are the steps to troubleshoot distcp:- 

1. it is not problem with the hdfs or Kerberos or distcp but a MapReduce. 
2. We tried to run a sample MR job to test, then it failed with the following exception Error: Java.io.IOException: initialization of all the collectors failed. Error in last collector was:java.io.IOException: Invalid “mapreduce.task.io.sort.mb”:3276. (The total amount of buffer memory to use while sorting files, in MB). It was expecting less than 2048. Changing this property able to run the distcp smooth. 

I want to take a moment and say thanks to Shelton for responding it on time.