Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase not starting CDH 4.7 unhanded exception

avatar
Explorer

Just setup a new cluster (CDH4.7) and the hbase master won't start due to an unhandled NullPointerException.  I've looked at various things (DNS, configs, etc), but I'm unable to figureout what is wrong, and I'm hoping someone might have an idea?  

 

Here's the exception:

 

2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: timeout = 300000

2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: unassigned timeout = 180000

2015-01-03 00:01:47,754 INFO org.apache.hadoop.hbase.master.SplitLogManager: resubmit threshold = 3

2015-01-03 00:01:47,762 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and 0 rescan nodes

2015-01-03 00:01:47,902 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.

org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException

        at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:329)

        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1409)

        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)

        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)

        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

 

        at org.apache.hadoop.ipc.Client.call(Client.java:1238)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

        at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)

        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:155)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

        at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)

        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)

        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)

        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:239)

        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:206)

        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:199)

        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)

        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)

        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)

        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:748)

        at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:286)

        at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:327)

        at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:444)

        at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148)

        at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:133)

        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:572)

        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432)

        at java.lang.Thread.run(Thread.java:724)

2015-01-03 00:01:47,905 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

2015-01-03 00:01:47,905 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads

2015-01-03 00:01:47,905 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000

 

Config is straight foward:

 

<?xml version="1.0" encoding="UTF-8"?>

 

<!--Autogenerated by Cloudera Manager-->

 

<configuration>

  <property>

    <name>hbase.rootdir</name>

    <value>hdfs://moz-prod/hbase</value>

  </property>

  <property>

    <name>hbase.cluster.distributed</name>

    <value>true</value>

  </property>

  <property>

    <name>hbase.client.write.buffer</name>

    <value>2097152</value>

  </property>

  <property>

    <name>hbase.client.pause</name>

    <value>100</value>

  </property>

  <property>

    <name>hbase.client.retries.number</name>

    <value>35</value>

  </property>

  <property>

    <name>hbase.client.scanner.caching</name>

    <value>100</value>

  </property>

  <property>

    <name>hbase.client.keyvalue.maxsize</name>

    <value>10485760</value>

  </property>

  <property>

    <name>hbase.rpc.timeout</name>

    <value>60000</value>

  </property>

  <property>

    <name>hbase.snapshot.enabled</name>

    <value>true</value>

  </property>

  <property>

    <name>hbase.security.authentication</name>

    <value>simple</value>

  </property>

  <property>

    <name>zookeeper.session.timeout</name>

    <value>60000</value>

  </property>

  <property>

    <name>zookeeper.znode.parent</name>

    <value>/hbase</value>

  </property>

  <property>

    <name>zookeeper.znode.rootserver</name>

    <value>root-region-server</value>

  </property>

  <!--Auto Failover Configuration (zookeeper)-->

  <property>

    <name>hbase.zookeeper.quorum</name>

    <value>dalmozhadoop1.dal.moz.com:2181,dalmozhadoop2.dal.moz.com:2181,dalmozhadoop3.dal.moz.com:2181</value>

  </property>

  <property>

    <name>hbase.zookeeper.property.clientPort</name>

    <value>2181</value>

  </property>

</configuration>

 

HDFS name nodes and data nodes work fine, as does map reduce.

1 ACCEPTED SOLUTION

avatar
Explorer

Thanks for pointing me in the right direction on this.

 

I had 2 issues:

 

1. My configs for zookeeper were incorrect (they did not specifically list out the zookeeper hosts) so the ha configuration for the namenodes wasn't working.

 

2. The file /etc/hadoop/conf/topology.py wasn't executable.

 

Once I fixed those two things it started working fine.

 

I found the executable issue via the namenode logs you pointed out:

 

...

java.io.IOException: Cannot run program "/etc/hadoop/conf/topology.py" (in directory "/usr/lib/hadoop"): error=13, Permission denied

...

 

Out of curiosity - is rack awareness required for hbase & yarn?

View solution in original post

4 REPLIES 4

avatar
Guru

Do you have kerberos running on the HDFS service?  Is the HMaster running on a different node than the Namenode?  If so, do you have any firewall applications running, like iptables?  I would look in the NN logs at the timestamp 2015-01-03 00:01:47,902 and see if there are any visible incoming requests for block locations from the HMaster.  For some reason the HMaster is just not able to lookup the blocks of the hbase WAL files while it's trying to assign regions.

avatar
Explorer

We don't have kerberos running, and we're running the HMaster on the same node as the primary hdfs name node (we have ha setup, so 2 name nodes).  There shouldn't be any firewalls in the mix, but we did find that the data nodes were in a different vlan than the name nodes, so we're in the process of moving them to the same name node to see if there is some traffic that is being blocked.

 

I'll check the NN logs - thanks for the tip.  If things start working after moving to the same vlan I'll let you know.

avatar
Explorer

Thanks for pointing me in the right direction on this.

 

I had 2 issues:

 

1. My configs for zookeeper were incorrect (they did not specifically list out the zookeeper hosts) so the ha configuration for the namenodes wasn't working.

 

2. The file /etc/hadoop/conf/topology.py wasn't executable.

 

Once I fixed those two things it started working fine.

 

I found the executable issue via the namenode logs you pointed out:

 

...

java.io.IOException: Cannot run program "/etc/hadoop/conf/topology.py" (in directory "/usr/lib/hadoop"): error=13, Permission denied

...

 

Out of curiosity - is rack awareness required for hbase & yarn?

avatar
Guru

Glad you worked it out.  No, the rack awareness script is only applicable to HDFS.  HBase is abstracted from needing to worry about that low-level stuff (eg. block placement).  HBase only cares about it's files (WAL logs, HFiles) and it will move those files around to attain data locality for the region servers, but it doesn't do anything with the rack awareness.