Member since
05-21-2021
33
Posts
1
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
782 | 06-23-2022 01:06 AM | |
1747 | 04-22-2022 02:24 AM | |
8721 | 03-29-2022 01:20 AM |
12-14-2022
08:39 AM
Hello Team, We recently upgraded from CDP 7.1.6 to CR 7.1.8. After the upgrade, we are having issues with starting the HDFS NFS Gateway role. Earlier, on CDP 7.1.6, we would run the command - hdfs --daemon start portmap and then start the HDFS NFS Gateway role from the Cloudera Manager, however after the upgrade this is not working (that is the HDFS NFS Gateway role doesn't start after running the above command and starting the role from the Cloudera Manager). We are getting the following issue Due to some security issues, we are not able to start rpcbind service. Is there a workaround to start the NFS Gateway? Note, the mount service does not start when even though the portmap service looks to have a running process. Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
HDFS
06-23-2022
01:06 AM
Update: After the restart of the Cluster the issue went away. All good now.
... View more
06-23-2022
12:30 AM
Hello @araujo, by the time I logged in to the node to check the entropy_avail value it became good, this issue seems to resolve fast as from the Cloudera Alert mail I can see good status within the next minute after this issue occurred. Also, from the screenshot attached you can the value was 1.
... View more
06-22-2022
01:09 PM
Hello Team, We are seeing a frequent entropy issue in our customer cluster as shown below. /proc/sys/kernel/random/entropy_avail returns 3754. We also installed the rng-tools in all the nodes and the service rngd is also running. After checking the document https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-data-at-rest-encryption-requirements.html#pnavId1 we can see the ExecStart looks as follows ExecStart=/sbin/rngd -f -r /dev/urandom However, our ExecStart is the default one and looks as follows ExecStart=/sbin/rngd -f can you please share if updating the ExecStart will solve the issue? Best Regards Sayed Anisul Hoque
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
06-17-2022
04:10 AM
Hello @ywu Thank you for the links. This one helps. So, if I understand correctly, there is not much we can do to control the size of the logs from the YARN but from the application itself since the application log files will continue to grow until the disk gets filled and the NodeManager goes into the decommissioned state, is it right?
... View more
06-17-2022
03:56 AM
Hello Team, We recently upgraded the CM from version 7.2.X to 7.6.1. Since this is a production cluster we didn't do a restart of Cluster yet. However, we are getting alert with YARN HistoryServer. After checking the logs, we couldn't see any issue from the YARN HistoryServer but we found issues in the Cloudera agent. Please check the logs as shown below. [16/Jun/2022 07:50:44 +0200] 3249 GM JOBHISTORY throttling_logger ERROR (4 skipped) Error fetching metrics at 'https://xxx.xxx.com:19890/jmx'
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 223, in _collect_and_parse_and_return
self._adapter.safety_valve))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 305, in urlopen_with_retry_on_authentication_errors
return function()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 245, in _open_url
cipher_list=self._tls_cipher_list)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 104, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 469, in error
result = self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 203, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 127, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 143, in retry_http_kerberos_auth
resp = self.parent.open(req)
File "/usr/lib64/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 360, in http_error_default
raise e
HTTPError: HTTP Error 403: Forbidden Could you please share what can we do to fix the error? Best Regards Sayed Anisul Hoque
... View more
Labels:
06-13-2022
04:56 AM
Hello Team, We had a situation where one application consumed over 1 TB of disk space which eventually flooded the disk space. We had to kill this application for freeing the space on this disk. Due to this not happening in the future, we want to limit the storage consumption of the YARN application. Could you please share how to configure this? Best Regards
... View more
Labels:
- Labels:
-
Apache YARN
04-27-2022
03:34 AM
Hello Team, In a customer cluster facing the issue with too many open file descriptors. The CDP version is 7.1.5 Bad : Open file descriptors: 31,364. File descriptor limit: 32,768. Percentage in use: 95.72%. Critical threshold: 70.00%. Can you please share how to mitigate this issue? Is it okay to increase the Maximum Process file descriptors and what would be the recommended value? Best Regards
... View more
Labels:
04-22-2022
02:24 AM
The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.
... View more
04-21-2022
07:18 AM
The logs from the CM agent on the host doing the task are shown below. [21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Launching process. one-off True, command dr/precopylistingcheck.sh, args [u'-bandwidth', u'100', u'-i', u'-m', u'20', u'-prbugpa', u'-skipAclErr', u'-update', u'-proxyuser', u'hbackup', u'-log', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2022-04-21_9975', u'-sequenceFilePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/fileList.seq', u'-diffRenameDeletePath', u'/user/PROXY_USER_PLACEHOLDER/.cm/distcp-staging/2022-04-21-13-55-02-50a875dd/renamesDeletesList.seq', u'-sourceconf', u'source-client-conf', u'-sourceprincipal', u'hdfs/SOURCE_HOSTNAME', u'-sourcetktcache', u'source.tgt', u'-copyListingOnSource', u'-useSnapshots', u'distcp-33--26584462', u'-ignoreSnapshotFailures', u'-diff', u'-useDistCpFileStatus', u'-replaceNameservice', u'-strategy', u'dynamic', u'-filters', u'exclusion-filter.list', u'-scheduleId', u'33', u'-scheduleName', u'test-copy', u'/test-prod2-copy', u'/test-prod2-copy']
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 2815-hdfs-precopylistingcheck-40444302'>)
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic audit plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Creating metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using specific metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue util INFO Using generic metadata plugin for process hdfs-precopylistingcheck-40444302
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin audit plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue throttling_logger INFO (22 skipped) Scheduling a refresh for Audit Plugin for hdfs-precopylistingcheck-40444302 with count 1 pipelines names [''].
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Begin metadata plugin refresh
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Not creating a monitor for 2815-hdfs-precopylistingcheck-40444302: should_monitor returns false
[21/Apr/2022 15:55:04 +0200] 1697 __run_queue process INFO Daemon refresh complete for process 2815-hdfs-precopylistingcheck-40444302.
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[21/Apr/2022 15:55:09 +0200] 1697 Metadata-Plugin throttling_logger INFO (22 skipped) Refreshing Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:09 +0200] 1697 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[21/Apr/2022 15:55:10 +0200] 1697 MainThread process INFO [2815-hdfs-precopylistingcheck-40444302] Unregistered supervisor process EXITED
[21/Apr/2022 15:55:10 +0200] 1697 MainThread supervisor INFO Triggering supervisord update.
[21/Apr/2022 15:55:10 +0200] 1697 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/2815-hdfs-precopylistingcheck-40444302/hdfs.keytab as a candidate to kinit from
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'running': (True, False), u'run_generation': (1, 5)}
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:29 +0200] 1697 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:29 +0200] 1697 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hdfs-precopylistingcheck-40444302 with count 0 pipelines names [].
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (5, 8)}
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (8, 11)}
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:55:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (11, 15)}
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:10 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (15, 19)}
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:25 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (19, 23)}
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:40 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors The below logs keeps repeating [21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Updating process: False {u'run_generation': (23, 27)}
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] Deactivating process (skipped)
[21/Apr/2022 15:56:55 +0200] 1697 __run_queue process INFO [2815-hdfs-precopylistingcheck-40444302] stopping monitors
... View more