Created 04-21-2022 03:29 AM
Hello Team,
In our customer cluster, we are testing the HDFS replication through Cloudera Manager. The replication policy looks as follows.
All the other configuration is the default. The replication is hung in the below state for a long time.
We looked into the Cloudera Manager logs and we can see the below error repeatedly occurring. Can you please help us to resolve the issue?
2022-04-21 12:27:57,199 ERROR CommandPusher-1:com.cloudera.cmf.service.AgentResultFetcher: Exception occured while handling tempfile com.cloudera.cmf.service.AgentResultFetcher@618eac09
Best Regards
Sayed Anisul Hoque
Created 04-22-2022 02:24 AM
The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.
Created 04-21-2022 07:18 AM
The logs from the CM agent on the host doing the task are shown below.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Launching process. one-off True, command dr/precopylistingcheck.sh, args [u'-bandwidth', u'100', u'-i', u'-m', u'20', u'-prbugpa', u'-skipAclErr', u'-update', u'-proxyuser', u'hbackup', u'-log', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2022-04-21_9975', u'-sequenceFilePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/fileList.seq', u'-diffRenameDeletePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/renamesDeletesList.seq', u'-sourceconf', u'source-client-conf', u'-sourceprincipal', u'hdfs/SOURCE_HOSTNAME', u'-sourcetktcache', u'source.tgt', u'-copyListingOnSource', u'-useSnapshots', u'distcp-33--26584462', u'-ignoreSnapshotFailures', u'-diff', u'-useDistCpFileStatus', u'-replaceNameservice', u'-strategy', u'dynamic', u'-filters', u'exclusion-filter.list', u'-scheduleId', u'33', u'-scheduleName', u'test-copy', u'/test-prod2-copy', u'/test-prod2-copy']
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 2815-hdfs-precopylistingcheck-40444302'>)
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic audit plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Creating metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using specific metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin audit plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue throttling_logger INFO (22 skipped) Scheduling a refresh for Audit Plugin for hdfs-precopylistingcheck-40444302 with count 1 pipelines names [''].
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin metadata plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Not creating a monitor for 2815-hdfs-precopylistingcheck-40444302: should_monitor returns false
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Daemon refresh complete for process 2815-hdfs-precopylistingcheck-40444302.
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin throttling_logger INFO (22 skipped) Refreshing Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:09 +0200] 1697 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[21/Apr/2022 15:55:10 +0200] 1697 MainThread process INFO [2815-hdfs-precopylistingcheck-40444302] Unregistered supervisor process EXITED
[21/Apr/2022 15:55:10 +0200] 1697 MainThread supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:10 +0200] 1697 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/2815-hdfs-precopylistingcheck-40444302/hdfs.keytab as a candidate to kinit from
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'running': (True, False), u'run_generation': (1, 5)}
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:29 +0200] 1697 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:29 +0200] 1697 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (5, 8)}
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (8, 11)}
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (11, 15)}
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (15, 19)}
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (19, 23)}
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
The below logs keeps repeating
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
Created 04-22-2022 02:24 AM
The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.