Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ambari dashboard : HDFS and YARN alerts

Ambari dashboard : HDFS and YARN alerts

Expert Contributor

Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1

There are several alerts seen on the Ambari dashboard(Currently, I am ignoring Accumulo alerts) :

4234-dashboard-alerts.png

The HDFS alerts are :

4235-alert-nn-checkpoint.png

4237-alert-failed-dir-count.png

When I checked the YARN alerts(some NodeManager unreachable and all), I checked one of the hosts and found the NodeManager stopped :

4238-nm-stopped.png

When I attempted to start it, I got the following error but couldn't figure out the root cause :

stderr:   /var/lib/ambari-agent/data/errors-1005.txt
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py", line 35, in <module>
    BeforeAnyHook().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py", line 29, in hook
    setup_users()
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/shared_initialization.py", line 41, in setup_users
    groups = params.user_to_groups_dict[user],
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/accounts.py", line 51, in action_create
    if getattr(self.resource, option_name) != None and getattr(self.resource, option_name) != attributes[0](self):
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/accounts.py", line 35, in <lambda>
    gid=(lambda self: grp.getgrgid(self.user.pw_gid).gr_name, "-g"),
KeyError: 'getgrgid(): gid not found: 7165'
Error: Error: Unable to run the custom hook script ['/usr/bin/python2.6', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-1005.json', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-1005.json', 'INFO', '/var/lib/ambari-agent/tmp']
stdout:   /var/lib/ambari-agent/data/output-1005.txt
2016-05-16 12:54:38,599 - Group['hadoop'] {}
2016-05-16 12:54:38,600 - Group['users'] {}
2016-05-16 12:54:38,600 - Group['knox'] {}
2016-05-16 12:54:38,600 - Group['spark'] {}
2016-05-16 12:54:38,601 - User['oozie'] {'gid': 'hadoop', 'groups': ['users']}
Error: Error: Unable to run the custom hook script ['/usr/bin/python2.6', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-1005.json', '/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-1005.json', 'INFO', '/var/lib/ambari-agent/tmp']

The (abridged due to size limit) nodemanager log is attached as follows :

yarn-yarn-nodemanager-l1034labssssecom-part-1log.txt

Another part of the log is attached in the comments below(due to the size and no. of attachments limit)

How shall I proceed ?

4 REPLIES 4

Re: Ambari dashboard : HDFS and YARN alerts

@Kaliyug Antagonist

Are you having this setup on Virtual environment ?

  1. For Namenode alert : was your any one of the namenode down for some reason ?
  2. For Yarn alert : I see from screenshot you have nodemanager down. Can you start and check it again.

Re: Ambari dashboard : HDFS and YARN alerts

Expert Contributor
  • There are total 9 physical nodes(1 nn, 8 dn)
  • The cluster is running for more than six months though not used frequently, so I cannot establish if the NN was down momentarily, I am unable to understand the fix to be done for NN checkpoint and the directory status
  • As mentioned in the question, I attempted to start the nodemanager on a host but it failed, the log(posted in the question) doesn't help much

Re: Ambari dashboard : HDFS and YARN alerts

  • For the Namenode alert it will automatically go off after sometime once it finishes the uncommited transaction. It requires some minimum threshold to be reached for which its displaying the alert.
  • Can you post nodemanager logs ie '/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log'

Re: Ambari dashboard : HDFS and YARN alerts

Expert Contributor

Uploaded the log in two steps(due to size and no. of attachments limitations) :

part-1 of the log in the original question,

part-2 as follows :

yarn-yarn-nodemanager-l1034labssssecom-part-2log.txt

Don't have an account?
Coming from Hortonworks? Activate your account here