<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Distcp got stuck with the below and doesn’t do anything. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292352#M216044</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/75838"&gt;@Arun66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;unfortunately with such a vague and incomplete log, we can't help much.&lt;/P&gt;&lt;P&gt;Questions?&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;CDH or HWX&lt;/LI&gt;&lt;LI&gt;Shar e the logs?&lt;/LI&gt;&lt;LI&gt;Share the command being executed?&lt;/LI&gt;&lt;LI&gt;Kerberized or not&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;And any indo you deem important&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Happy hadooping&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 23 Mar 2020 21:55:00 GMT</pubDate>
    <dc:creator>Shelton</dc:creator>
    <dc:date>2020-03-23T21:55:00Z</dc:date>
    <item>
      <title>Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292259#M215996</link>
      <description>&lt;PRE&gt;&lt;SPAN class="line"&gt;INFO tools.DistCp: Input Options: DistCpOptions ooxx&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class="line"&gt;INFO client.AHSProxy: Connecting to Application History server at ooxx&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class="line"&gt;INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 256 for oo at xx&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class="line"&gt;INFO security.TokenCache: Got dt for hdfs://clusterA:8020; Kind: HDFS_DELEGATION_TOKEN, Service: ...&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class="line"&gt;INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class="line"&gt;INFO tools.SimpleCopyListing: Build file listing&lt;/SPAN&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Mar 2020 02:13:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292259#M215996</guid>
      <dc:creator>Arun66</dc:creator>
      <dc:date>2020-03-23T02:13:33Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292352#M216044</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/75838"&gt;@Arun66&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;unfortunately with such a vague and incomplete log, we can't help much.&lt;/P&gt;&lt;P&gt;Questions?&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;CDH or HWX&lt;/LI&gt;&lt;LI&gt;Shar e the logs?&lt;/LI&gt;&lt;LI&gt;Share the command being executed?&lt;/LI&gt;&lt;LI&gt;Kerberized or not&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;And any indo you deem important&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Happy hadooping&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Mar 2020 21:55:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292352#M216044</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2020-03-23T21:55:00Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292354#M216046</link>
      <description>&lt;P&gt;Yes, the cluster is kerberized, HWX, HDP 3.1.5. can't seem to find logs for the below operation. Below is the simple command:-&lt;/P&gt;&lt;P&gt;hadoop distcp /user/home/test.txt /tmp/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;20/03/23 18:16:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/home/test.txt], targetPath=/tmp, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false, directWrite=false}, sourcePaths=[/user/home/test.txt], targetPathExists=true, preserveRawXattrsfalse&lt;BR /&gt;20/03/23 18:16:59 INFO client.AHSProxy: Connecting to Application History server at host:10200&lt;BR /&gt;20/03/23 18:16:59 INFO hdfs.DFSClient: Created token for eid: HDFS_DELEGATION_TOKEN owner=EID@Domian.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161 on ha-hdfs:nn-ha&lt;BR /&gt;20/03/23 18:16:59 INFO kms.KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=Eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7))&lt;BR /&gt;20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nn-ha, Ident: (token for eid: HDFS_DELEGATION_TOKEN owner=eid@AOINS.COM, renewer=yarn, realUser=, issueDate=1585001819568, maxDate=1585606619568, sequenceNumber=44990, masterKeyId=161)&lt;BR /&gt;20/03/23 18:16:59 INFO security.TokenCache: Got dt for hdfs://nn-ha; Kind: kms-dt, Service: kms://https@host:9393/kms, Ident: (kms-dt owner=eid, renewer=yarn, realUser=, issueDate=1585001819728, maxDate=1585606619728, sequenceNumber=11938, masterKeyId=7)&lt;BR /&gt;20/03/23 18:16:59 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0&lt;BR /&gt;20/03/23 18:16:59 INFO tools.SimpleCopyListing: Build file listing completed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;When i do a ctrl+C to close the long running distcp job abruptly gives me below exception:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 with inode: 56047223&lt;BR /&gt;org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0 (inode 56047223) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ERROR hdfs.DFSClient: Failed to close file: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index with inode: 56047224&lt;BR /&gt;org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/home/.staging/_distcp-1688802777/fileList.seq_sorted.0.index (inode 56047224) Holder DFSClient_NONMAPREDUCE_1216352325_1 does not have any open files.&lt;/P&gt;&lt;P&gt;ERROR tools.DistCp: Exception encountered&lt;BR /&gt;java.nio.channels.ClosedChannelException&lt;/P&gt;</description>
      <pubDate>Mon, 23 Mar 2020 22:26:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292354#M216046</guid>
      <dc:creator>kasa</dc:creator>
      <dc:date>2020-03-23T22:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292357#M216049</link>
      <description>&lt;P&gt;&lt;EM&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/73896"&gt;@kasa&lt;/a&gt;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Disctcp is used for inter/Intracluster copy but the command you are running is not wrong because you need the source and destination N&lt;STRONG&gt;ameNodes.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$ hadoop distcp /user/home/test.txt /tmp/&lt;/LI-CODE&gt;&lt;P&gt;&lt;EM&gt;The most common use o&lt;STRONG&gt;f DistCp&lt;/STRONG&gt; is an inter-cluster copy, where you copy from &lt;STRONG&gt;NameNode1[nn1]&lt;/STRONG&gt; to &lt;STRONG&gt;Namenode2[nn2]&lt;/STRONG&gt; on 2 different clusters and both clusters should be up and running during the process&lt;/EM&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$ hadoop distcp hdfs://nn1:8020/source hdfs://nn2:8020/destination&lt;/LI-CODE&gt;&lt;P&gt;&lt;EM&gt;Where &lt;STRONG&gt;hdfs://nn1:8020/source&lt;/STRONG&gt; is the data source, and &lt;STRONG&gt;hdfs://nn2:8020/destination&lt;/STRONG&gt; is the destination. This will expand the namespace under &lt;STRONG&gt;/source&lt;/STRONG&gt; on NameNode "&lt;STRONG&gt;nn1&lt;/STRONG&gt;" into a temporary file, partition its contents among a set of map&amp;nbsp;tasks, and start copying from "&lt;STRONG&gt;nn1&lt;/STRONG&gt;" to "&lt;STRONG&gt;nn2&lt;/STRONG&gt;". Note that DistCp requires absolute paths.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Personally I think you should use CopyToLocal instead as according to my understanding you are trying to copy a file from hdfs to you local tmp directory&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Assuming your directory &lt;STRONG&gt;/user/home/&lt;/STRONG&gt; is in hdfs and you are running the command as HDFS user! This will copy the test.txt from hdfs to local /tmp directory&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$ hdfs dfs -copyToLocal /user/home/test.txt /tmp/&lt;/LI-CODE&gt;&lt;P&gt;&lt;EM&gt;And to successfully copy between 2 kerberized cluster you should perform the &lt;A href="https://community.cloudera.com/t5/Community-Articles/Kerberos-cross-realm-trust-for-distcp/ta-p/245590" target="_blank" rel="noopener"&gt;Kerberos cross-realm trust for distcp&lt;/A&gt; it's simple to setup just follow the guide and you will be fine&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Please let me know if my assumption is correct&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Mar 2020 23:36:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292357#M216049</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2020-03-23T23:36:18Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292358#M216050</link>
      <description>&lt;P&gt;I absolutely agree. Initially i was testing distcp with two trusted clusters with the below command. Since it was getting stuck, did a simple test to copy within the cluster but still the same issue. (All the prerequisites are met to do a distcp)&lt;/P&gt;&lt;P&gt;hadoop distcp hdfs://nn:8020/user/hdfs_home_eid/test.txt hdfs://nn:8020/tmp/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;just as an FYI. Below is the link has similar issue:- but it dint help me to resolve the problem.&lt;/P&gt;&lt;P&gt;&lt;A href="http://people.apache.org/~liuml07/2017/07/05/DistCp-gets-stuck-with-build-listing/" target="_blank"&gt;http://people.apache.org/~liuml07/2017/07/05/DistCp-gets-stuck-with-build-listing/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 00:00:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292358#M216050</guid>
      <dc:creator>kasa</dc:creator>
      <dc:date>2020-03-24T00:00:34Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292359#M216051</link>
      <description>&lt;P&gt;The other thing which i noticed while testing the same in other cluster, distcp is getting stuck when trying to connect to application history server.&lt;/P&gt;&lt;P&gt;20/03/23 20:38:40 INFO client.AHSProxy: Connecting to Application History server at host/ipaddress:10200&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 00:51:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292359#M216051</guid>
      <dc:creator>kasa</dc:creator>
      <dc:date>2020-03-24T00:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292391#M216065</link>
      <description>&lt;P&gt;&lt;EM&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/73896"&gt;@kasa&lt;/a&gt;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Can you share a scrambled version of your &lt;STRONG&gt;krb5.conf&lt;/STRONG&gt; from both clusters and the &lt;STRONG&gt;auth-to_local&lt;/STRONG&gt; of both clusters&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the&amp;nbsp;core-site.xml&amp;nbsp;file:&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;hadoop.security.auth_to_local&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;&amp;lt;/value&amp;gt;
    &amp;lt;description&amp;gt;Maps kerberos principals to local user names&amp;lt;/description&amp;gt;
&amp;lt;/property&amp;gt; &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Secure-to-Secure: Kerberos Principal Name&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;Assign the same principle name to applicable NameNodes in the source and destination clusters.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;distcp hdfs://hdp-2.0-secure hdfs://hdp-2.0-secure&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;EM&gt;The SASL RPC client requires that the remote server’s Kerberos principal must match the server principal in its own configuration. Therefore, the same principal name must be assigned to the applicable NameNodes in the source and the destination cluster. &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;For example, if the Kerberos principal name of the NameNode in the source cluster is &lt;STRONG&gt;nn/host1@realm&lt;/STRONG&gt;, the Kerberos principal name of the NameNode in destination cluster must be &lt;STRONG&gt;nn/host2@realm&lt;/STRONG&gt;, rather than &lt;STRONG&gt;nn2/host2@realm.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;U&gt;&lt;STRONG&gt;Secure-to-Secure: ResourceManager mapping rules&lt;/STRONG&gt;&lt;/U&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;When copying between two HDP2 secure clusters, further ResourceManager (RM) configuration is required if the two clusters have different realms.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Can you share your &lt;STRONG&gt;hadoop.security.auth_to_local&lt;/STRONG&gt; on both clusters, in order for DistCP to succeed, the same RM mapping rule must be used in both clusters. I am assuming the REALMS are &lt;STRONG&gt;TEST.COM&lt;/STRONG&gt; and &lt;STRONG&gt;DEV.COM&lt;/STRONG&gt; for cluster 1 and 2 respectively&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;hadoop.security.auth_to_local&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;
    RULE:[2:$1@$0](rm@.*CLUSTER1.TEST.COM)s/.*/yarn/
     RULE:[2:$1@$0](rm@.*CLUSTER2.DEV.COM)s/.*/yarn/
     DEFAULT
    &amp;lt;/value&amp;gt;
&amp;lt;/property&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Can you try that and revert&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2020 10:35:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292391#M216065</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2020-03-24T10:35:53Z</dc:date>
    </item>
    <item>
      <title>Re: Distcp got stuck with the below and doesn’t do anything.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292572#M216168</link>
      <description>&lt;P&gt;Below are the steps to troubleshoot distcp:-&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. it is not problem with the hdfs or Kerberos or distcp but a MapReduce.&amp;nbsp;&lt;BR /&gt;2. We tried to run a sample MR job to test, then it failed with the following exception Error: Java.io.IOException: initialization of all the collectors failed. Error in last collector was:java.io.IOException: Invalid &lt;STRONG&gt;“mapreduce.task.io.sort.mb”:3276.&lt;/STRONG&gt; (The total amount of buffer memory to use while sorting files, in MB). It was expecting less than 2048. Changing this property able to run the distcp smooth.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I want to take a moment and say thanks to Shelton for responding it on time.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Mar 2020 17:38:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Distcp-got-stuck-with-the-below-and-doesn-t-do-anything/m-p/292572#M216168</guid>
      <dc:creator>Arun66</dc:creator>
      <dc:date>2020-03-25T17:38:52Z</dc:date>
    </item>
  </channel>
</rss>

