About Sayed016

Sayed016 · ‎12-11-2024

@VidyaSargur it somewhat helped. It was failing because we had an NFS client running on that server. Since we have a customer-facing client -> server architecture for NFS, we could not start the HDFS NFS Gateway again on the same port. So, the only solution was to stop the HDFS NFS Gateway.

Sayed016 · ‎12-14-2022

Hello Team, We recently upgraded from CDP 7.1.6 to CR 7.1.8. After the upgrade, we are having issues with starting the HDFS NFS Gateway role. Earlier, on CDP 7.1.6, we would run the command - hdfs --daemon start portmap and then start the HDFS NFS Gateway role from the Cloudera Manager, however after the upgrade this is not working (that is the HDFS NFS Gateway role doesn't start after running the above command and starting the role from the Cloudera Manager). We are getting the following issue Due to some security issues, we are not able to start rpcbind service. Is there a workaround to start the NFS Gateway? Note, the mount service does not start when even though the portmap service looks to have a running process. Best Regards Sayed Anisul Hoque

Sayed016 · ‎06-23-2022

Update: After the restart of the Cluster the issue went away. All good now.

Sayed016 · ‎06-23-2022

Hello @araujo, by the time I logged in to the node to check the entropy_avail value it became good, this issue seems to resolve fast as from the Cloudera Alert mail I can see good status within the next minute after this issue occurred. Also, from the screenshot attached you can the value was 1.

Sayed016 · ‎06-22-2022

Hello Team, We are seeing a frequent entropy issue in our customer cluster as shown below. /proc/sys/kernel/random/entropy_avail returns 3754. We also installed the rng-tools in all the nodes and the service rngd is also running. After checking the document https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-data-at-rest-encryption-requirements.html#pnavId1 we can see the ExecStart looks as follows ExecStart=/sbin/rngd -f -r /dev/urandom However, our ExecStart is the default one and looks as follows ExecStart=/sbin/rngd -f can you please share if updating the ExecStart will solve the issue? Best Regards Sayed Anisul Hoque

Sayed016 · ‎06-17-2022

Hello @ywu Thank you for the links. This one helps. So, if I understand correctly, there is not much we can do to control the size of the logs from the YARN but from the application itself since the application log files will continue to grow until the disk gets filled and the NodeManager goes into the decommissioned state, is it right?

Sayed016 · ‎06-17-2022

Hello Team, We recently upgraded the CM from version 7.2.X to 7.6.1. Since this is a production cluster we didn't do a restart of Cluster yet. However, we are getting alert with YARN HistoryServer. After checking the logs, we couldn't see any issue from the YARN HistoryServer but we found issues in the Cloudera agent. Please check the logs as shown below. [16/Jun/2022 07:50:44 +0200] 3249 GM JOBHISTORY throttling_logger ERROR (4 skipped) Error fetching metrics at 'https://xxx.xxx.com:19890/jmx' Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 223, in _collect_and_parse_and_return self._adapter.safety_valve)) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 305, in urlopen_with_retry_on_authentication_errors return function() File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/generic/metric_collectors.py", line 245, in _open_url cipher_list=self._tls_cipher_list) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/url.py", line 104, in urlopen_with_timeout return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 437, in open response = meth(req, response) File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.7/urllib2.py", line 469, in error result = self._call_chain(*args) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 203, in http_error_401 retry = self.http_error_auth_reqed(host, req, headers) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 127, in http_error_auth_reqed return self.retry_http_kerberos_auth(req, headers, neg_value) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/urllib2_kerberos.py", line 143, in retry_http_kerberos_auth resp = self.parent.open(req) File "/usr/lib64/python2.7/urllib2.py", line 437, in open response = meth(req, response) File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.7/urllib2.py", line 475, in error return self._call_chain(*args) File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/https.py", line 360, in http_error_default raise e HTTPError: HTTP Error 403: Forbidden Could you please share what can we do to fix the error? Best Regards Sayed Anisul Hoque

Sayed016 · ‎06-13-2022

Hello Team, We had a situation where one application consumed over 1 TB of disk space which eventually flooded the disk space. We had to kill this application for freeing the space on this disk. Due to this not happening in the future, we want to limit the storage consumption of the YARN application. Could you please share how to configure this? Best Regards

Sayed016 · ‎04-27-2022

Hello Team, In a customer cluster facing the issue with too many open file descriptors. The CDP version is 7.1.5 Bad : Open file descriptors: 31,364. File descriptor limit: 32,768. Percentage in use: 95.72%. Critical threshold: 70.00%. Can you please share how to mitigate this issue? Is it okay to increase the Maximum Process file descriptors and what would be the recommended value? Best Regards

Sayed016 · ‎04-22-2022

The issue was resolved. The problem was the directory owner and group in the subfolders of /var/lib/cloudera-scm-server. The owner and the group need to be cloudera-scm:cloudera-scm, somehow these values changed to root:root.

Online	Offline
Last Visited	‎12-11-2024 07:17 AM

Member Since	‎05-21-2021 03:03 AM
Last Visited	‎12-11-2024 07:17 AM
Posts	34
Kudos received	1

Cloudera Community

Re: YARN HistoryServer Web Status issue

Re: HDFS replication issue through Cloudera Manage...

Re: Spark job through Oozie is failing with - sche...

Re: Not able to start HDFS NFS Gateway after 7.1.8...

Not able to start HDFS NFS Gateway after 7.1.8 upg...

Re: YARN HistoryServer Web Status issue

Re: Frequent Entropy issue

Frequent Entropy issue

Re: How to configure YARN to limit YARN applicatio...

YARN HistoryServer Web Status issue

How to configure YARN to limit YARN applications' ...

Impala Open File Descriptors limit

Re: HDFS replication issue through Cloudera Manage...