<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Slow distcp job termination in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414203#M255211</link>
    <description>&lt;P&gt;Thank you for the great insights&lt;/P&gt;&lt;P&gt;These are the disctp options&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;distcp -skipcrccheck -copybuffersize 32768 -update -pugp  -bandwidth 50 -strategy dynamic&lt;/LI-CODE&gt;&lt;P&gt;and yes, total execution time increases almost linearly with nr of files/directories&lt;/P&gt;</description>
    <pubDate>Mon, 08 Jun 2026 07:27:16 GMT</pubDate>
    <dc:creator>ganzuoni</dc:creator>
    <dc:date>2026-06-08T07:27:16Z</dc:date>
    <item>
      <title>Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414193#M255201</link>
      <description>&lt;P&gt;I'm copying several directories with several files each&lt;/P&gt;&lt;P&gt;Yarn application full termination seems affected by the number of object transferred&lt;/P&gt;&lt;P&gt;In this case we have 9 minutes between 100% completion and job termination&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;26/06/05 10:47:20 INFO mapreduce.Job: map 100% reduce 0%
26/06/05 10:56:25 INFO mapreduce.Job: Job job_1780210407885_1954 completed successfully
with
Files Copied=68744
DIR_COPY=23058&lt;/LI-CODE&gt;&lt;P&gt;while here few seconds&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;26/06/05 10:43:11 INFO mapreduce.Job:  map 100% reduce 0%
26/06/05 10:43:32 INFO mapreduce.Job: Job job_1780210407885_1958 completed successfully
with
Files Copied=12300
DIR_COPY=191&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Is the any explanation for such a behaviour?&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 09:19:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414193#M255201</guid>
      <dc:creator>ganzuoni</dc:creator>
      <dc:date>2026-06-05T09:19:48Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414194#M255202</link>
      <description>&lt;P&gt;Checking node manager logs the time spent is between container removal and container success declaration&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2026-06-05 11:08:15,452 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_e22_1780210407885_1976_01_000013]
2026-06-05 11:17:55,672 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_e22_1780210407885_1976_01_000001 succeeded&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 05 Jun 2026 09:31:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414194#M255202</guid>
      <dc:creator>ganzuoni</dc:creator>
      <dc:date>2026-06-05T09:31:57Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414195#M255203</link>
      <description>&lt;P&gt;And checking RM there is a huge number of call to the AM , one each 10 seconds:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;2026-06-05 11:09:39,050 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=arcgis   OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1780210407885_1976    CONTAINERID=container_e22_1780210407885_1976_01_000014  RESOURCE=&amp;lt;memory:2048, vCores:1&amp;gt;        QUEUENAME=arcgis
2026-06-05 11:09:40,102 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: null is accessing unchecked http://almapwrk15.data.com:34620/ws/v1/mapreduce/jobs/job_1780210407885_1976 which is the app master GUI of application_1780210407885_1976 owned by arcgis
2026-06-05 11:09:50,116 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: null is accessing unchecked http://almapwrk15.data.com:34620/ws/v1/mapreduce/jobs/job_1780210407885_1976 which is the app master GUI of application_1780210407885_1976 owned by arcgis
.....
2026-06-05 11:17:41,013 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: null is accessing unchecked http://almapwrk15.data.com:34620/ws/v1/mapreduce/jobs/job_1780210407885_1976 which is the app master GUI of application_1780210407885_1976 owned by arcgis
2026-06-05 11:17:49,422 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1780210407885_1976_000001 with final state: FINISHING, and exit status: -1000
2026-06-05 11:17:49,422 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1780210407885_1976_000001 State change from RUNNING to FINAL_SAVING on event = UNREGISTERED
2026-06-05 11:17:49,422 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1780210407885_1976 with final state: FINISHING
2026-06-05 11:17:49,422 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1780210407885_1976 State change from RUNNING to FINAL_SAVING on event = ATTEMPT_UNREGISTERED
2026-06-05 11:17:49,433 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1780210407885_1976
2026-06-05 11:17:49,433 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1780210407885_1976_000001 State change from FINAL_SAVING to FINISHING on event = ATTEMPT_UPDATE_SAVED&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 05 Jun 2026 10:03:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414195#M255203</guid>
      <dc:creator>ganzuoni</dc:creator>
      <dc:date>2026-06-05T10:03:25Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414196#M255204</link>
      <description>&lt;P&gt;Anyway, everything seems related to logs collection for JobHistory&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 10:04:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414196#M255204</guid>
      <dc:creator>ganzuoni</dc:creator>
      <dc:date>2026-06-05T10:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414201#M255209</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/11754"&gt;@ganzuoni&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our YARN experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/114383"&gt;@vafs&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/84931"&gt;@vchalla&lt;/a&gt;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 21:42:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414201#M255209</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2026-06-05T21:42:13Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414202#M255210</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/11754"&gt;@ganzuoni&lt;/a&gt;&amp;nbsp;&lt;FONT face="arial,helvetica,sans-serif"&gt;Thank you for sharing the log excerpts and details regarding the distcp job.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;The likely root cause is not Yarn container cleanup itself, but the DistCp/MapReduce job finalization phase taking longer due to the much higher object and directory count. The larger job had ~120x more directories than the smaller job, so post-copy operations such as directory metadata handling, output commit/cleanup, target validation, permission/timestamp preservation, and JobHistory finalization can take significantly longer even after the map phase shows 100%.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;The NodeManager/RM logs also support this, since the AM container remained alive until the application moved to FINAL_SAVING/FINISHING. The repeated RM proxy calls every 10 seconds only indicate that the AM web endpoint was still being polled while the job was finalizing; they do not appear to be the cause of the delay.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;1. Review the ApplicationMaster logs for the delayed application.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "commit|committer|cleanup|CopyCommitter|OutputCommitter|JobHistory|history|rename|delete|_temporary|_SUCCESS|unregister|finish|final"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;2. Check specifically whether the AM container was spending time in job commit or cleanup before unregistering from YARN.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "commitJob|cleanupJob|job commit|job cleanup|unregister|FinalApplicationStatus|succeeded|FINISHING|FINAL_SAVING"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;3. Identify the ApplicationMaster container and review only the AM logs if the full log output is too large.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn application -status application_1780210407885_1976&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;&amp;gt;Then use the AM container ID from the output/logs and run:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;-containerId container_e22_1780210407885_1976_01_000001 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "commit|cleanup|committer|CopyCommitter|OutputCommitter|JobHistory|history|rename|delete|unregister|finish|final"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;4. Confirm the exact DistCp command/options used for the slower job.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "distcp|-p|-delete|-atomic|-update|-overwrite|-direct|preserve|options"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;&amp;gt;&amp;gt;Pay particular attention to whether any of these options were used:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;-p,-delete,-atomic,-update,-overwrite,-direct&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;5. Check whether metadata preservation may be increasing finalization time.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;If -p was used, confirm which attributes were preserved, for example permissions, ownership, group, timestamps, ACLs, or XAttrs. These can add many filesystem metadata operations when the job has many directories/files.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "preserve|permission|owner|group|timestamp|acl|xattr|chown|chmod|setTimes"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;6. Check the ResourceManager logs around the delayed finalization window.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;grep -E "application_1780210407885_1976|appattempt_1780210407885_1976|UNREGISTERED|FINAL_SAVING|FINISHING|AM Released Container" \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;/var/log/hadoop-yarn/yarn-yarn-resourcemanager-*.log&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;7. Check the NodeManager logs on the AM host around the same time window.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;From the RM log, the AM web endpoint appears to be on:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;almapwrk15.data.com:34620&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;On that NodeManager host, run:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;grep -E "application_1780210407885_1976|container_e22_1780210407885_1976_01_000001|succeeded|Removed completed containers|ContainerLaunch" \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;/var/log/hadoop-yarn/yarn-yarn-nodemanager-*.log&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;8. Compare with the faster application to confirm whether the delay scales with object/directory count.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1958 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "commit|cleanup|committer|CopyCommitter|OutputCommitter|JobHistory|history|rename|delete|unregister|finish|final"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;9. If the target is object storage, confirm whether commit/rename/delete operations are slow.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;For object-store targets, rename/delete/list operations can be more expensive than on HDFS. Check for object-store related messages:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;yarn logs -applicationId application_1780210407885_1976 \&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;| egrep -i "s3a|abfs|wasb|ozone|ofs|object|rename|delete|listStatus|copy|multipart|commit"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;10. Suggested conclusion after validation:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;The current evidence points to the MapReduce ApplicationMaster spending additional time in the post-map finalization phase, most likely due to the high number of files/directories and associated metadata/commit/cleanup operations. The RM proxy polling appears to be observational only and does not appear to be the cause. The next confirmation should come from the AM container logs around the gap between map 100% and the AM unregistering from YARN.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 22:25:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414202#M255210</guid>
      <dc:creator>tnair</dc:creator>
      <dc:date>2026-06-05T22:25:24Z</dc:date>
    </item>
    <item>
      <title>Re: Slow distcp job termination</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414203#M255211</link>
      <description>&lt;P&gt;Thank you for the great insights&lt;/P&gt;&lt;P&gt;These are the disctp options&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;distcp -skipcrccheck -copybuffersize 32768 -update -pugp  -bandwidth 50 -strategy dynamic&lt;/LI-CODE&gt;&lt;P&gt;and yes, total execution time increases almost linearly with nr of files/directories&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jun 2026 07:27:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Slow-distcp-job-termination/m-p/414203#M255211</guid>
      <dc:creator>ganzuoni</dc:creator>
      <dc:date>2026-06-08T07:27:16Z</dc:date>
    </item>
  </channel>
</rss>

