Created on 01-05-2015 10:53 AM - edited 09-16-2022 02:17 AM
Just setup a new cluster (CDH4.7) and the hbase master won't start due to an unhandled NullPointerException. I've looked at various things (DNS, configs, etc), but I'm unable to figureout what is wrong, and I'm hoping someone might have an idea?
Here's the exception:
2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: timeout = 300000
2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: unassigned timeout = 180000
2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: resubmit threshold = 3
2015-01-03 00:01:47,762 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and 0 rescan nodes
2015-01-03 00:01:47,902 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:329)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1409)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
at org.apache.hadoop.ipc.Client.call(Client.java:1238)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:239)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:206)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:199)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:748)
at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:286)
at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:327)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:444)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:133)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:572)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432)
at java.lang.Thread.run(Thread.java:724)
2015-01-03 00:01:47,905 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2015-01-03 00:01:47,905 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
2015-01-03 00:01:47,905 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000
Config is straight foward:
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://moz-prod/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.client.write.buffer</name>
<value>2097152</value>
</property>
<property>
<name>hbase.client.pause</name>
<value>100</value>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>35</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>100</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>10485760</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>60000</value>
</property>
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>60000</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>zookeeper.znode.rootserver</name>
<value>root-region-server</value>
</property>
<!--Auto Failover Configuration (zookeeper)-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>dalmozhadoop1.dal.moz.com:2181,dalmozhadoop2.dal.moz.com:2181,dalmozhadoop3.dal.moz.com:2181</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
HDFS name nodes and data nodes work fine, as does map reduce.
Created 01-15-2015 06:18 PM
Thanks for pointing me in the right direction on this.
I had 2 issues:
1. My configs for zookeeper were incorrect (they did not specifically list out the zookeeper hosts) so the ha configuration for the namenodes wasn't working.
2. The file /etc/hadoop/conf/topology.py wasn't executable.
Once I fixed those two things it started working fine.
I found the executable issue via the namenode logs you pointed out:
...
java.io.IOException: Cannot run program "/etc/hadoop/conf/topology.py" (in directory "/usr/lib/hadoop"): error=13, Permission denied
...
Out of curiosity - is rack awareness required for hbase & yarn?
Created 01-08-2015 10:47 AM
Do you have kerberos running on the HDFS service? Is the HMaster running on a different node than the Namenode? If so, do you have any firewall applications running, like iptables? I would look in the NN logs at the timestamp 2015-01-03 00:01:47,902 and see if there are any visible incoming requests for block locations from the HMaster. For some reason the HMaster is just not able to lookup the blocks of the hbase WAL files while it's trying to assign regions.
Created 01-09-2015 02:55 PM
We don't have kerberos running, and we're running the HMaster on the same node as the primary hdfs name node (we have ha setup, so 2 name nodes). There shouldn't be any firewalls in the mix, but we did find that the data nodes were in a different vlan than the name nodes, so we're in the process of moving them to the same name node to see if there is some traffic that is being blocked.
I'll check the NN logs - thanks for the tip. If things start working after moving to the same vlan I'll let you know.
Created 01-15-2015 06:18 PM
Thanks for pointing me in the right direction on this.
I had 2 issues:
1. My configs for zookeeper were incorrect (they did not specifically list out the zookeeper hosts) so the ha configuration for the namenodes wasn't working.
2. The file /etc/hadoop/conf/topology.py wasn't executable.
Once I fixed those two things it started working fine.
I found the executable issue via the namenode logs you pointed out:
...
java.io.IOException: Cannot run program "/etc/hadoop/conf/topology.py" (in directory "/usr/lib/hadoop"): error=13, Permission denied
...
Out of curiosity - is rack awareness required for hbase & yarn?
Created 01-16-2015 05:56 AM
Glad you worked it out. No, the rack awareness script is only applicable to HDFS. HBase is abstracted from needing to worry about that low-level stuff (eg. block placement). HBase only cares about it's files (WAL logs, HFiles) and it will move those files around to attain data locality for the region servers, but it doesn't do anything with the rack awareness.