Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Failed to execute command Start on service ZooKeeper

avatar
Explorer

Hello,

 

I got the following error log when Cloudera Manager wants to start ZOOKEEPER on my cluster:

 

[20/Dec/2018 11:23:35 +0000] 21974 MainThread process      ERROR    Could not evaluate resource {u'path': u'/var/lib/zookeeper', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/process.py", line 963, in _do_directory_resources
    self.osops.mkabsdir(d["path"], user=d["user"], group=d["group"], mode=d["mode"])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py", line 173, in mkabsdir
    if mode is not None and oct(mode) != mdata.mode:
UnboundLocalError: local variable 'mdata' referenced before assignment
[20/Dec/2018 11:23:35 +0000] 21974 MainThread process      ERROR    Could not evaluate resource {u'path': u'/var/lib/zookeeper', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/process.py", line 963, in _do_directory_resources
    self.osops.mkabsdir(d["path"], user=d["user"], group=d["group"], mode=d["mode"])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py", line 173, in mkabsdir
    if mode is not None and oct(mode) != mdata.mode:
UnboundLocalError: local variable 'mdata' referenced before assignment
[20/Dec/2018 11:23:35 +0000] 21974 MainThread process      ERROR    Could not evaluate resource {u'path': u'/var/log/zookeeper', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/process.py", line 963, in _do_directory_resources
    self.osops.mkabsdir(d["path"], user=d["user"], group=d["group"], mode=d["mode"])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py", line 173, in mkabsdir
    if mode is not None and oct(mode) != mdata.mode:
UnboundLocalError: local variable 'mdata' referenced before assignment
[20/Dec/2018 11:23:35 +0000] 21974 MainThread process      ERROR    Could not evaluate resource {u'path': u'/var/log/zookeeper/stacks', u'bytes_free_warning_threshhold_bytes': 0, u'group': u'cloudera-scm', u'user': u'cloudera-scm', u'mode': 493}
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/process.py", line 963, in _do_directory_resources
    self.osops.mkabsdir(d["path"], user=d["user"], group=d["group"], mode=d["mode"])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py", line 173, in mkabsdir
    if mode is not None and oct(mode) != mdata.mode:
UnboundLocalError: local variable 'mdata' referenced before assignment

Can anyone please help?

I am using the latest version od CDH and Manager.

 

Thanks

5 REPLIES 5

avatar
New Contributor

I am facing same issue please help me out

avatar
New Contributor

Failed to execute command Start on service zookeeper 

 

please need help on this.

avatar
Community Manager

It appears the Cloudera Manager Agent is having issues accessing the /var/lib/zookeeper directory.

 

You can try perfoming a hard restart on the agent:   https://www.cloudera.com/documentation/enterprise/5-16-x/topics/cm_ag_agents.html

 

When starting a service, the agent prepares the configuration then passes a request to the supervisord to launch the service.  A hard restart on the agent will free up any resources that may still be held from the supervisord.   This will also stop any other services running on the node so if anything else is running, stop them from the Cloudera Manager console before performing the hard restart.

 

 

 



David Wilder, Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Master Guru

@Somanath,

 

We are sorry to hear that you are hitting a problem starting zookeeper.

Before looking at doing a hard restart, let's verify some information about the issue.

There are many possible causes of the zookeeper not starting, so we need to be sure we understand your particular issue in order to make sure we can suggest the best way forward.

 

(1)

 

First, please share the agent log (by default /var/log/cloudera-scm-agent/cloudera-scm-agent.log) information that shows the zookeeper process trying to start.

 

(2)

 

Next, if you see the line "Triggering supervisord update" following the lines referring to zookeeper, also review the stdout.log and stderr.log from your zookeeper process directory.  You can access them here

 

# ls /var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep -i ZOOKEEPER| tail -1`/logs

 

The above will list the logs directory contents for the most recent process.  If it is empty, then that indicates that the supervisor was not signaled to start zookeeper.  This would mean that the error happens during agent processing.

 

(3)

 

What version of Cloudera Manager and CDH are you using?

avatar
Master Guru

@Somanath,

 

Based on my code review and testing, the original logging that was provided in this thread is caused by a minor bug in CM 5.12 and higher only when single-user mode is configured or the agent is not running as root.

 

I opened a new internal Cloudera Jira for this issue: OPSAPS-49735.

 

In my case, though, even though I reproduced the errors, this did not prevent the Zookeeper server from starting.  I think it would be advised that you still review the logs to make certain of the cause of the server failing to start.

 

On any host showing the "UnboundLocalError: local variable 'mdata' referenced before assignment" error:

 

(1)

 

Back up your os_ops.py file so you can role back if required

Assuming you have Python 2.7 like was posted in the error in this thread, you can find the os_ops.py file here:

/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/os_ops.py

 

prompt> cd /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/

prompt> cp ./os_ops.py ./os_ops.py.original

 

(2)

 

Edit os_ops.py to move "mdata = self.get_path_metadata(path)" before the "if" conditional:

 

prompt> vim os_ops.py

 

Locate the following block of code in mkabsdir():

 

 

    if os.path.isdir(path):
      # Log warnings if user/group/mode are different than what's expected
      if self.honor_users_and_groups:
        mdata = self.get_path_metadata(path)

Move this line above "if self.honor_users_and_groups:":

 

 

 

mdata = self.get_path_metadata(path)

 

The result should look like this:

 

 

    if os.path.isdir(path):
      # Log warnings if user/group/mode are different than what's expected
mdata = self.get_path_metadata(path) if self.honor_users_and_groups: if user is not None and user != mdata.user: LOG.warning('Expected user %s for %s but was %s', user, path, mdata.user) if group is not None and group != mdata.group: LOG.warning('Expected group %s for %s but was %s', group, path, mdata.group) if mode is not None and oct(mode) != mdata.mode: LOG.warning('Expected mode %s for %s but was %s', oct(mode), path, mdata.mode) return False

Save your edits

 

This change will make sure that "mdata" is assigned a value before it is referenced.

 

(3)

 

Restart the agent on the host where you updated os_ops.py:

 

prompt> systemctl restart cloudera-scm-agent

or on el6 oses:

prompt> service cloudera-scm-agent restart

 

(4)

 

If the agent does not restart and it cites some python problem, you can revert by copying the "os_ops.py.original" file to overwrite the "os_ops.py" file you edited.  Restart after that.