Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to perform hbck from datanodes.

avatar
Explorer

Unable to perform hbck from datanodes. The hbck is working from namenode. Hbase version is same in all nodes.  

HBase 0.94.15-cdh4.7.1 

Command used; # sudo -u hbase hbase hbck 

 

Notr getting any error messages.

 

1 ACCEPTED SOLUTION

avatar
Mentor
Unless your NameNode is down, the only other reason, minus firewalls/etc.
misconfigs, is that the NameNode port (and maybe other services' ports) are
listening on a different network interfaces than the internal IP one. You
could verify this with netstat on the refusing service's host.

View solution in original post

5 REPLIES 5

avatar
Mentor
You will need a HBase Gateway role defined on any host you want to run
HBase commands on. Without the local configs, the commands may not function
correctly.

avatar
Explorer

Here is the slave configuration. How can I add  gateway role?. Thanks

 

======================
root@slave1:/etc/hbase/conf# cat hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://slave1/hbase_cdh471</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeper1,zookeeper2,zookeeper3</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase_cdh471</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>10</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>4294967296</value>
</property>
<property>
<name>hbase.regionserver.ipc.address</name>
<value>slave1</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.small</name>
<value>1</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.large</name>
<value>1</value>
</property>
</configuration>
================================

avatar
Mentor
Sorry, I assumed you use Cloudera Manager here. It appears you do have proper local configs in place (from what I understand by 'slave configuration').

Could you try running the command in DEBUG mode to see what its hung on?

export HBASE_ROOT_LOGGER=DEBUG,console
export HADOOP_ROOT_LOGGER=DEBUG,console
sudo -E -u hbase hbase hbck

Are you also certain there's no firewall involved between your cluster hosts, that may otherwise be preventing communication?

avatar
Explorer

Here is the output. It seems like the port is refusing the connection. There is no firewall restriction. What could be the possible reason ? . Thanks

 

========

root@ip-xxx.xxx.xxx.yyy:~# sudo -E -u hbase hbase hbck
15/10/01 09:42:06 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
15/10/01 09:42:06 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
15/10/01 09:42:06 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)
15/10/01 09:42:06 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
15/10/01 09:42:06 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
15/10/01 09:42:06 DEBUG security.Groups: Creating new Groups object
15/10/01 09:42:06 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
15/10/01 09:42:06 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
15/10/01 09:42:06 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
15/10/01 09:42:06 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
15/10/01 09:42:06 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
15/10/01 09:42:06 DEBUG security.UserGroupInformation: hadoop login
15/10/01 09:42:06 DEBUG security.UserGroupInformation: hadoop login commit
15/10/01 09:42:06 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hbase
15/10/01 09:42:06 DEBUG security.UserGroupInformation: UGI loginUser:hbase (auth:SIMPLE)
15/10/01 09:42:08 DEBUG hdfs.NameNodeProxies: multipleLinearRandomRetry = null
15/10/01 09:42:08 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWritable, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@43ebf1ca
15/10/01 09:42:10 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
15/10/01 09:42:10 DEBUG ipc.Client: The ping interval is 60000 ms.
15/10/01 09:42:10 DEBUG ipc.Client: Use SIMPLE authentication for protocol ClientNamenodeProtocolPB
15/10/01 09:42:10 DEBUG ipc.Client: Connecting to ip-xxx.xxx.xxx.yyy.eu-west-1.compute.internal/xxx.xxx.xxx.yyy:8020
15/10/01 09:42:10 DEBUG ipc.Client: closing ipc connection to ip-xxx.xxx.xxx.yyy.eu-west-1.compute.internal/xxx.xxx.xxx.yyy:8020: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:510)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1291)
at org.apache.hadoop.ipc.Client.call(Client.java:1209)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy10.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy10.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:441)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1526)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1509)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:406)
at org.apache.hadoop.hbase.util.HBaseFsck.preCheckPermission(HBaseFsck.java:1430)
at org.apache.hadoop.hbase.util.HBaseFsck.exec(HBaseFsck.java:3653)
at org.apache.hadoop.hbase.util.HBaseFsck.run(HBaseFsck.java:3502)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3493)
15/10/01 09:42:10 DEBUG ipc.Client: IPC Client (1163306898) connection to ip-xxx.xxx.xxx.yyy.eu-west-1.compute.internal/xxx.xxx.xxx.yyy:8020 from hbase: closed
15/10/01 09:42:10 DEBUG ipc.Client: Stopping client

========

avatar
Mentor
Unless your NameNode is down, the only other reason, minus firewalls/etc.
misconfigs, is that the NameNode port (and maybe other services' ports) are
listening on a different network interfaces than the internal IP one. You
could verify this with netstat on the refusing service's host.