Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari with Active HBase Master and Hbase Master

avatar
Contributor

Hi

I have a cluster of servers (6 in total), each with 4 NIC interfaces, 2 of the NICs on each server connects to the outside world and the other 2 on each server connects internally to the other servers in the cluster using switches. All the traffic internally goes over this bond, so bond0 (eno1 and eno3) is for external traffic and bond1 (eno2 and eno4) is for internal traffic. In the /etc/hosts files there are entries for the hostname that points to the external bond and host names with xxxxyyint to indicate internal and that points to the internal bond, bond1. Everything works, but. In the config for Hbase, I can specify the Hbase active master in the field hbase.master.hostname=xxxx01int, my question is, how do I specify the Hbase Master hostname, I tried something like hbase.master.hostname=xxxx01int,xxxx03int , but that does not seem to work. The alert that I'm getting says

Hbase Master Process - connection failed [Errno 111] Connection refused to xxxx03int:16000

When I telnet to 16000 from xxxx01int to xxxx03int, it only seems to work on the external IP address, not the internal IP address. It seem that the hostname command is used and of course the hostname reports the external host name, not the internal hostname.

1 ACCEPTED SOLUTION

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
8 REPLIES 8

avatar
Super Guru

First, please familiarize yourself with the write-up here: https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html

Even when you have multiple interfaces, you should have a consistent hostname for a single machine. Depending on where a client tries to access that machine from (the network path required to access that machines), DNS must return the correct IP address so that the client can communicate with that machine.

e.g. If the client is accessing the machine from the public network, the hostname should resolve to a public network IP address.

To have HBase listen on multiple interfaces, make sure that you specify 0.0.0.0 as the bind address as declared in the above document.

avatar
Contributor

Thanks Josh, let me have a read on this, it is confirmed, my bind address is set to 0.0.0.0

avatar
Contributor

Hi @

Josh Elser

,

I had a look at the document, but I just cannot seem to find the problem, I have gone so far as in to setup my own bind (DNS) server on one of the servers in the cluster. When I do nslookup with internal IP, external IP, internal hostname and external hostname, they are all resolved. The problem I think is two-fold, when I specify hbase.master.dns.interface=eno2 and hbase.regionserver.dns.interface=eno2, I get the following error (which seems to be documented all over)

2017-02-07 11:23:00,418 INFO  [main] util.ServerCommandLine: vmName=OpenJDK 64-Bit Server VM, vmVendor=Oracle Corporation, vmVersion=25.111-b15
2017-02-07 11:23:00,418 INFO  [main] util.ServerCommandLine: vmInputArguments=[-Dproc_master, -XX:OnOutOfMemoryError=kill -9 %p, -Dhdp.version=2.5.3.0-37, -XX:+UseConcMarkSweepGC, -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log, -Djava.io.tmpdir=/tmp, -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -Xloggc:/var/log/hbase/gc.log-201702071122, -Xmx1024m, -Dhbase.log.dir=/var/log/hbase, -Dhbase.log.file=hbase-hbase-master-server01.xxxx.com.log, -Dhbase.home.dir=/usr/hdp/current/hbase-master/bin/.., -Dhbase.id.str=hbase, -Dhbase.root.logger=INFO,RFA, -Djava.library.path=:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native, -Dhbase.security.logger=INFO,RFAS]
2017-02-07 11:23:00,549 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2515)
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:235)
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2529)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hbase.util.DNS.getDefaultHost(DNS.java:53)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.getHostname(RSRpcServices.java:922)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:867)
    at org.apache.hadoop.hbase.master.MasterRpcServices.<init>(MasterRpcServices.java:230)
    at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:581)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:540)
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:411)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2510)
    ... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
    at org.apache.hadoop.net.DNS.reverseDns(DNS.java:82)
    at org.apache.hadoop.net.DNS.getHosts(DNS.java:253)
    at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:366)
    ... 21 more

When I take these parameters out, the Active Master and Standby Master starts up, but on the external hostname and IP address, the alert says that it is trying to connect to the internal hostname and internal IP address

[root@server01 hbase]# netstat -anp | grep 16000
tcp6       0      0 172.28.200.198:16000    :::*                    LISTEN      17293/java         
tcp6       0      0 172.28.200.198:30230    172.28.200.214:16000    ESTABLISHED 17898/java        
[root@server01 hbase]#
Connection failed: [Errno 111] Connection refused to server01int.xxxx.com:16000

The ifconfig seems to be correct, eno1 is external and eno2 is internal,

All the /etc/hosts files contain all the servers in the cluster

[root@server01 hbase]# ifconfig -a
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.28.200.198  netmask 255.255.255.0  broadcast 172.28.200.255
        inet6 fe80::ec4:7aff:fecd:f1f0  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:cd:f1:f0  txqueuelen 1000  (Ethernet)
        RX packets 1559331  bytes 1448481094 (1.3 GiB)
        RX errors 0  dropped 120  overruns 0  frame 0
        TX packets 966299  bytes 324828255 (309.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xc7500000-c757ffff  

eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.101  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ec4:7aff:fecd:f1f1  prefixlen 64  scopeid 0x20<link>
        ether 0c:c4:7a:cd:f1:f1  txqueuelen 1000  (Ethernet)
        RX packets 17758610  bytes 8386323271 (7.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19826227  bytes 15357623455 (14.3 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xc7400000-c747ffff  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 25674869  bytes 14514139121 (13.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 25674869  bytes 14514139121 (13.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@server01 hbase]# 
[root@server03 hbase]# cat /etc/hosts
127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4
::1  localhost localhost.localdomain localhost6 localhost6.localdomain6
172.28.200.214   server03.xxxx.com   server03
192.168.1.103   server03int.xxxx.com   server03int 

# Entries for Ambari on internal IPs
192.168.1.106   server06int.xxxx.com   server06int
192.168.1.105   server05int.xxxx.com   server05int
192.168.1.104   server04int.xxxx.com   server04int
192.168.1.101   server01int.xxxx.com   server01int
192.168.1.102   server02int.xxxx.com   server02int
192.168.1.103   server03int.xxxx.com   server03int
# End-Entries for Ambari on internal IPs
[root@server03 hbase]# 

nslookup resolves with no problem

[root@server03 hbase]# nslookup
> server01
Server:        192.168.1.101
Address:    192.168.1.101#53

Name:    server01.xxxx.com
Address: 172.28.200.198
> server01int
Server:        192.168.1.101
Address:    192.168.1.101#53

Name:    server01int.xxxx.com
Address: 192.168.1.101
> server01.xxxx.com
Server:        192.168.1.101
Address:    192.168.1.101#53

Name:    server01.xxxx.com
Address: 172.28.200.198
> server01int.xxxx.com
Server:        192.168.1.101
Address:    192.168.1.101#53

Name:    server01int.xxxx.com
Address: 192.168.1.101
> 192.168.1.101
Server:        192.168.1.101
Address:    192.168.1.101#53

101.1.168.192.in-addr.arpa    name = server01int.xxxx.com.
> 172.28.200.198
Server:        192.168.1.101
Address:    192.168.1.101#53

198.200.28.172.in-addr.arpa    name = server01.xxxx.com.
> exit

[root@server03 hbase]# 

Any idea of what I'm not doing wrong here, please

Regards

avatar
Super Guru

The interface specification you provided is used by HBase to determine its hostname. Because we're specifying the bind-all IP address (0.0.0.0) and you have multiple interfaces, we can't definitively know which one we're supposed to use for HBase services to advertise themselves. What should be happening is that the IP address for that interface is looked up, and then an rDNS call is made to figure out the hostname for that address.

Part of your issue might be trying to use separate hostnames for the different networks. This inherently doesn't work in scenarios where Kerberos is configured, so perhaps you are running into convention-based issues. As a general rule, the hostname for a server should be consistent regardless of where the client is coming from. In other words, you shouldn't have "server03" and "server03int", just "server03". Your DNS server determines what IP address to return based on where the client is coming from (external or internal).

I'll have to look at the source code to figure out what was missing that caused the ArrayIndexOutOfBoundsException and get back to you.

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

Thanks @

Josh Elser

I going to disable IPV6 on OS level and I'll try the -Djava.net.preferIPv4Stack=true

as well. One thing though, you say

e.g. for "10.0.0.1", generate "1.0.0.10.in-addr.arpa" , the way I understand the rDNS lookup is that it will swap the 1st 2 and the last 2, IE 192.168.1.101 will be 168.192.101.1, see my DNS entries below, I'm thinking that if the resolve is like

"10.0.0.1", generate "1.0.0.10.in-addr.arpa" , my entries will not be "hit" and that might be the problem?

zone "168.192.in-addr.arpa" {  

type master;  file "/etc/named/zones/db.192.168";  # 192.168.1 subnet

};
[root@server01 zones]# cat db.192.168 
$TTL    604800
@       IN      SOA     server01int.xxxx.com. admin.xxxx.com. (
              3         ; Serial
             604800     ; Refresh
              86400     ; Retry
            2419200     ; Expire
             604800 )   ; Negative Cache TTL
; name servers - NS records
    IN      NS      server01int.xxxx.com.
    IN      NS      server02int.xxxx.com.
; PTR Records
101.1   IN      PTR     server01int.xxxx.com.    ; 192.168.1.101
102.1   IN      PTR     server02int.xxxx.com.    ; 192.168.1.102
103.1   IN      PTR     server03int.xxxx.com.    ; 192.168.1.103
104.1   IN      PTR     server04int.xxxx.com.    ; 192.168.1.104
105.1   IN      PTR     server05int.xxxx.com.    ; 192.168.1.105
106.1   IN      PTR     server06int.xxxx.com.    ; 192.168.1.106
[root@server01 zones]#

avatar
Contributor

Hi @

Josh Elser

I made the changes on the OS disabling the IPV6 and that seems to have done the trick, thanks so much for the suggestion
[root@server02 ~]# vi /etc/sysctl.conf
[root@server02 ~]# sysctl -p
net.ipv4.tcp_keepalive_time = 300
net.ipv4.ip_local_port_range = 1024 65000
fs.file-max = 64000
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
[root@server02 ~]# systemctl restart network
[root@server02 ~]#

avatar
Super Guru

Great. Glad you got to the bottom of it!