Support Questions
Find answers, ask questions, and share your expertise

HDFS replication issue through Cloudera Manager

Contributor

Hello Team,

 

In our customer cluster, we are testing the HDFS replication through Cloudera Manager. The replication policy looks as follows.

Sayed016_0-1650536786169.png

All the other configuration is the default. The replication is hung in the below state for a long time.

Sayed016_1-1650536840175.png

We looked into the Cloudera Manager logs and we can see the below error repeatedly occurring. Can you please help us to resolve the issue?

2022-04-21 12:27:57,199 ERROR CommandPusher-1:com.cloudera.cmf.service.AgentResultFetcher: Exception occured while handling tempfile com.cloudera.cmf.service.AgentResultFetcher@618eac09

 

Best Regards

Sayed Anisul Hoque

 

1 ACCEPTED SOLUTION

Contributor

The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.

View solution in original post

2 REPLIES 2

Contributor

The logs from the CM agent on the host doing the task are shown below.

[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Launching process. one-off True, command dr/precopylistingcheck.sh, args [u'-bandwidth', u'100', u'-i', u'-m', u'20', u'-prbugpa', u'-skipAclErr', u'-update', u'-proxyuser', u'hbackup', u'-log', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2022-04-21_9975', u'-sequenceFilePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/fileList.seq', u'-diffRenameDeletePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/renamesDeletesList.seq', u'-sourceconf', u'source-client-conf', u'-sourceprincipal', u'hdfs/SOURCE_HOSTNAME', u'-sourcetktcache', u'source.tgt', u'-copyListingOnSource', u'-useSnapshots', u'distcp-33--26584462', u'-ignoreSnapshotFailures', u'-diff', u'-useDistCpFileStatus', u'-replaceNameservice', u'-strategy', u'dynamic', u'-filters', u'exclusion-filter.list', u'-scheduleId', u'33', u'-scheduleName', u'test-copy', u'/test-prod2-copy', u'/test-prod2-copy']
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor   WARNING  Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 2815-hdfs-precopylistingcheck-40444302'>)
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor   INFO     Triggering supervisord update.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util         INFO     Using generic audit plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util         INFO     Creating metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util         INFO     Using specific metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util         INFO     Using generic metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process      INFO     Begin audit plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue throttling_logger INFO     (22 skipped) Scheduling a refresh for Audit Plugin for hdfs-precopylistingcheck-40444302 with count 1 pipelines names [''].
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process      INFO     Begin metadata plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process      INFO     Not creating a monitor for 2815-hdfs-precopylistingcheck-40444302: should_monitor returns false
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process      INFO     Daemon refresh complete for process 2815-hdfs-precopylistingcheck-40444302.
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin navigator_plugin INFO     Pipelines updated for Metadata Plugin: []
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin throttling_logger INFO     (22 skipped) Refreshing Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:09 +0200] 1697 Audit-Plugin navigator_plugin INFO     Pipelines updated for Audit Plugin: []
[21/Apr/2022 15:55:10 +0200] 1697 MainThread process      INFO     [2815-hdfs-precopylistingcheck-40444302] Unregistered supervisor process EXITED
[21/Apr/2022 15:55:10 +0200] 1697 MainThread supervisor   INFO     Triggering supervisord update.
[21/Apr/2022 15:55:10 +0200] 1697 MainThread throttling_logger INFO     Removed keytab /var/run/cloudera-scm-agent/process/2815-hdfs-precopylistingcheck-40444302/hdfs.keytab as a candidate to kinit from
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'running': (True, False), u'run_generation': (1, 5)}
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:29 +0200] 1697 Metadata-Plugin navigator_plugin INFO     stopping Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:29 +0200] 1697 Audit-Plugin navigator_plugin INFO     stopping Audit Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (5, 8)}
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (8, 11)}
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (11, 15)}
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (15, 19)}
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (19, 23)}
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors

 

The below logs keeps repeating

[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process      INFO     [2815-hdfs-precopylistingcheck-40444302] stopping monitors

 

 

Contributor

The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.

; ;