Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to start zookeeper failover controller in HA cluster

avatar
Expert Contributor

Hi,

I am unable to start zookeeper failover controller in HA node. So, both namenode is in standby mode and cluster along with hdfs is going down. Also, hbase region-server is going down. Below is trace of zookeeper failover controller log-

Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/zkfc_slave.py", line 166, in <module> ZkfcSlave().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/zkfc_slave.py", line 68, in start create_log_dir=True File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 271, in service environment=hadoop_env_exports File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start zkfc'' returned 1. starting zkfc, logging to /var/log/hadoop/hdfs/hadoop-hdfs-zkfc-sachadooptst2.corp.mirrorplus.com.out

Please help.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Issue is resolved. Below are the steps to start zkfc manually-

hdfs zkfc -formatZK

su - hdfs

/usr/hdp/2.2.8.0-3150/hadoop/sbin/hadoop-daemon.sh start zkfc

View solution in original post

3 REPLIES 3

avatar
Master Guru

The output tells you where to find additional information. Look in the end:

/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start zkfc'' returned 1. starting zkfc, logging to /var/log/hadoop/hdfs/hadoop-hdfs-zkfc-sachadooptst2.corp.mirrorplus.com.out

So can you look into "/var/log/hadoop/hdfs/hadoop-hdfs-zkfc-sachadooptst2.corp.mirrorplus.com.out" to get more information.

avatar
Expert Contributor

Issue is resolved. Below are the steps to start zkfc manually-

hdfs zkfc -formatZK

su - hdfs

/usr/hdp/2.2.8.0-3150/hadoop/sbin/hadoop-daemon.sh start zkfc

avatar
New Contributor

I have same issue. My namenode fail when starting zkfc.

Is there any fix.