Created 05-28-2018 07:00 AM
After Namenode HA, 2 out of my 3 Region Servers in HBase are not coming up. I looked at the logs and found that it is throwing unknown host exception for name service.
2018-05-24 08:48:29,551 INFO [regionserver/atlhashdn02.hashmap.net/192.166.4.37:16020] regionserver.HRegionServer: STOPPED: Failed initialization 2018-05-24 08:48:29,552 ERROR [regionserver/atlhashdn02.hashmap.net/192.166.4.37:16020] regionserver.HRegionServer: Failed init java.lang.IllegalArgumentException: java.net.UnknownHostException: clusterha at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:688) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:629) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2761) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179) at org.apache.hadoop.hbase.wal.DefaultWALProvider.init(DefaultWALProvider.java:97) at org.apache.hadoop.hbase.wal.WALFactory.getProvider(WALFactory.java:148) at org.apache.hadoop.hbase.wal.WALFactory.<init>(WALFactory.java:180) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1648) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1381) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:917) at java.lang.Thread.run(Thread.java:745)
Created 05-29-2018 01:14 PM
Looks like Hadoop configurations was missing under classpath and it is not able to detect the nameservice.
Can you check hbase config directory on working region server and non-working region server whether there are missing core-site.xml or hdfs-site.xml files?
Created 05-29-2018 01:14 PM
Looks like Hadoop configurations was missing under classpath and it is not able to detect the nameservice.
Can you check hbase config directory on working region server and non-working region server whether there are missing core-site.xml or hdfs-site.xml files?
Created 06-01-2018 09:45 AM
I checked hbase-site.xml, hdfs-site.xml and core-site.xml. They are exactly same on both nodes.
Created 06-02-2018 12:39 PM
Well, the configuration files were correct, but the environment was not set properly. Checked hbase env on both nodes and found a difference. Update with the following properties in ambari and it worked:
export LD_LIBRARY_PATH=::/usr/hdp/2.6.3.0-235/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/2.6.3.0-235/hadoop/lib/native export HADOOP_HOME=/usr/hdp/2.6.3.0-235/hadoop export HADOOP_CONF_DIR=/usr/hdp/2.6.3.0-235/hadoop/etc/hadoop