Created 02-06-2017 07:56 AM
Hi
I have a cluster of servers (6 in total), each with 4 NIC interfaces, 2 of the NICs on each server connects to the outside world and the other 2 on each server connects internally to the other servers in the cluster using switches. All the traffic internally goes over this bond, so bond0 (eno1 and eno3) is for external traffic and bond1 (eno2 and eno4) is for internal traffic. In the /etc/hosts files there are entries for the hostname that points to the external bond and host names with xxxxyyint to indicate internal and that points to the internal bond, bond1. Everything works, but. In the config for Hbase, I can specify the Hbase active master in the field hbase.master.hostname=xxxx01int, my question is, how do I specify the Hbase Master hostname, I tried something like hbase.master.hostname=xxxx01int,xxxx03int , but that does not seem to work. The alert that I'm getting says
Hbase Master Process - connection failed [Errno 111] Connection refused to xxxx03int:16000
When I telnet to 16000 from xxxx01int to xxxx03int, it only seems to work on the external IP address, not the internal IP address. It seem that the hostname command is used and of course the hostname reports the external host name, not the internal hostname.
Created 02-07-2017 05:12 PM
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.net.DNS.reverseDns(DNS.java:82) at org.apache.hadoop.net.DNS.getHosts(DNS.java:253) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:366) ... 21 more
The code is doing the following:
1. Taking the specified interface name (or "default" if not specified) and fetches the InetAddress's for that interface
2. For each InetAddress, split the result from `getHostAddress()` on "." and generate the reverse IP.
e.g. for "10.0.0.1", generate "1.0.0.10.in-addr.arpa"
Sadly, there isn't any logging here to make this easy. It might be something as simple as an IPv6 address being used instead. You could try to add the following to HBASE_MASTER_OPTS and HBASE_REGIONSERVER_OPTS in hbase-env.sh:
-Djava.net.preferIPv4Stack=true
I don't have another suggestion off the top of my head..
Created 02-06-2017 03:50 PM
First, please familiarize yourself with the write-up here: https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html
Even when you have multiple interfaces, you should have a consistent hostname for a single machine. Depending on where a client tries to access that machine from (the network path required to access that machines), DNS must return the correct IP address so that the client can communicate with that machine.
e.g. If the client is accessing the machine from the public network, the hostname should resolve to a public network IP address.
To have HBase listen on multiple interfaces, make sure that you specify 0.0.0.0 as the bind address as declared in the above document.
Created 02-07-2017 06:13 AM
Thanks Josh, let me have a read on this, it is confirmed, my bind address is set to 0.0.0.0
Created 02-07-2017 10:27 AM
Hi @
,
I had a look at the document, but I just cannot seem to find the problem, I have gone so far as in to setup my own bind (DNS) server on one of the servers in the cluster. When I do nslookup with internal IP, external IP, internal hostname and external hostname, they are all resolved. The problem I think is two-fold, when I specify hbase.master.dns.interface=eno2 and hbase.regionserver.dns.interface=eno2, I get the following error (which seems to be documented all over)
2017-02-07 11:23:00,418 INFO [main] util.ServerCommandLine: vmName=OpenJDK 64-Bit Server VM, vmVendor=Oracle Corporation, vmVersion=25.111-b15 2017-02-07 11:23:00,418 INFO [main] util.ServerCommandLine: vmInputArguments=[-Dproc_master, -XX:OnOutOfMemoryError=kill -9 %p, -Dhdp.version=2.5.3.0-37, -XX:+UseConcMarkSweepGC, -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log, -Djava.io.tmpdir=/tmp, -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -Xloggc:/var/log/hbase/gc.log-201702071122, -Xmx1024m, -Dhbase.log.dir=/var/log/hbase, -Dhbase.log.file=hbase-hbase-master-server01.xxxx.com.log, -Dhbase.home.dir=/usr/hdp/current/hbase-master/bin/.., -Dhbase.id.str=hbase, -Dhbase.root.logger=INFO,RFA, -Djava.library.path=:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native, -Dhbase.security.logger=INFO,RFAS] 2017-02-07 11:23:00,549 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2515) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:235) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2529) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.util.DNS.getDefaultHost(DNS.java:53) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getHostname(RSRpcServices.java:922) at org.apache.hadoop.hbase.regionserver.RSRpcServices.<init>(RSRpcServices.java:867) at org.apache.hadoop.hbase.master.MasterRpcServices.<init>(MasterRpcServices.java:230) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:581) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:540) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:411) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2510) ... 5 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.net.DNS.reverseDns(DNS.java:82) at org.apache.hadoop.net.DNS.getHosts(DNS.java:253) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:366) ... 21 more
When I take these parameters out, the Active Master and Standby Master starts up, but on the external hostname and IP address, the alert says that it is trying to connect to the internal hostname and internal IP address
[root@server01 hbase]# netstat -anp | grep 16000 tcp6 0 0 172.28.200.198:16000 :::* LISTEN 17293/java tcp6 0 0 172.28.200.198:30230 172.28.200.214:16000 ESTABLISHED 17898/java [root@server01 hbase]#
Connection failed: [Errno 111] Connection refused to server01int.xxxx.com:16000
The ifconfig seems to be correct, eno1 is external and eno2 is internal,
All the /etc/hosts files contain all the servers in the cluster
[root@server01 hbase]# ifconfig -a eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.28.200.198 netmask 255.255.255.0 broadcast 172.28.200.255 inet6 fe80::ec4:7aff:fecd:f1f0 prefixlen 64 scopeid 0x20<link> ether 0c:c4:7a:cd:f1:f0 txqueuelen 1000 (Ethernet) RX packets 1559331 bytes 1448481094 (1.3 GiB) RX errors 0 dropped 120 overruns 0 frame 0 TX packets 966299 bytes 324828255 (309.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xc7500000-c757ffff eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.101 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::ec4:7aff:fecd:f1f1 prefixlen 64 scopeid 0x20<link> ether 0c:c4:7a:cd:f1:f1 txqueuelen 1000 (Ethernet) RX packets 17758610 bytes 8386323271 (7.8 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 19826227 bytes 15357623455 (14.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xc7400000-c747ffff lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 25674869 bytes 14514139121 (13.5 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 25674869 bytes 14514139121 (13.5 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@server01 hbase]#
[root@server03 hbase]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.28.200.214 server03.xxxx.com server03 192.168.1.103 server03int.xxxx.com server03int # Entries for Ambari on internal IPs 192.168.1.106 server06int.xxxx.com server06int 192.168.1.105 server05int.xxxx.com server05int 192.168.1.104 server04int.xxxx.com server04int 192.168.1.101 server01int.xxxx.com server01int 192.168.1.102 server02int.xxxx.com server02int 192.168.1.103 server03int.xxxx.com server03int # End-Entries for Ambari on internal IPs [root@server03 hbase]#
nslookup resolves with no problem
[root@server03 hbase]# nslookup > server01 Server: 192.168.1.101 Address: 192.168.1.101#53 Name: server01.xxxx.com Address: 172.28.200.198 > server01int Server: 192.168.1.101 Address: 192.168.1.101#53 Name: server01int.xxxx.com Address: 192.168.1.101 > server01.xxxx.com Server: 192.168.1.101 Address: 192.168.1.101#53 Name: server01.xxxx.com Address: 172.28.200.198 > server01int.xxxx.com Server: 192.168.1.101 Address: 192.168.1.101#53 Name: server01int.xxxx.com Address: 192.168.1.101 > 192.168.1.101 Server: 192.168.1.101 Address: 192.168.1.101#53 101.1.168.192.in-addr.arpa name = server01int.xxxx.com. > 172.28.200.198 Server: 192.168.1.101 Address: 192.168.1.101#53 198.200.28.172.in-addr.arpa name = server01.xxxx.com. > exit [root@server03 hbase]#
Any idea of what I'm not doing wrong here, please
Regards
Created 02-07-2017 04:00 PM
The interface specification you provided is used by HBase to determine its hostname. Because we're specifying the bind-all IP address (0.0.0.0) and you have multiple interfaces, we can't definitively know which one we're supposed to use for HBase services to advertise themselves. What should be happening is that the IP address for that interface is looked up, and then an rDNS call is made to figure out the hostname for that address.
Part of your issue might be trying to use separate hostnames for the different networks. This inherently doesn't work in scenarios where Kerberos is configured, so perhaps you are running into convention-based issues. As a general rule, the hostname for a server should be consistent regardless of where the client is coming from. In other words, you shouldn't have "server03" and "server03int", just "server03". Your DNS server determines what IP address to return based on where the client is coming from (external or internal).
I'll have to look at the source code to figure out what was missing that caused the ArrayIndexOutOfBoundsException and get back to you.
Created 02-07-2017 05:12 PM
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.hadoop.net.DNS.reverseDns(DNS.java:82) at org.apache.hadoop.net.DNS.getHosts(DNS.java:253) at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:366) ... 21 more
The code is doing the following:
1. Taking the specified interface name (or "default" if not specified) and fetches the InetAddress's for that interface
2. For each InetAddress, split the result from `getHostAddress()` on "." and generate the reverse IP.
e.g. for "10.0.0.1", generate "1.0.0.10.in-addr.arpa"
Sadly, there isn't any logging here to make this easy. It might be something as simple as an IPv6 address being used instead. You could try to add the following to HBASE_MASTER_OPTS and HBASE_REGIONSERVER_OPTS in hbase-env.sh:
-Djava.net.preferIPv4Stack=true
I don't have another suggestion off the top of my head..
Created 02-08-2017 05:36 AM
Thanks @
I going to disable IPV6 on OS level and I'll try the -Djava.net.preferIPv4Stack=true
as well. One thing though, you say
e.g. for "10.0.0.1", generate "1.0.0.10.in-addr.arpa" , the way I understand the rDNS lookup is that it will swap the 1st 2 and the last 2, IE 192.168.1.101 will be 168.192.101.1, see my DNS entries below, I'm thinking that if the resolve is like
"10.0.0.1", generate "1.0.0.10.in-addr.arpa" , my entries will not be "hit" and that might be the problem?
zone "168.192.in-addr.arpa" { type master; file "/etc/named/zones/db.192.168"; # 192.168.1 subnet };
[root@server01 zones]# cat db.192.168 $TTL 604800 @ IN SOA server01int.xxxx.com. admin.xxxx.com. ( 3 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL ; name servers - NS records IN NS server01int.xxxx.com. IN NS server02int.xxxx.com. ; PTR Records 101.1 IN PTR server01int.xxxx.com. ; 192.168.1.101 102.1 IN PTR server02int.xxxx.com. ; 192.168.1.102 103.1 IN PTR server03int.xxxx.com. ; 192.168.1.103 104.1 IN PTR server04int.xxxx.com. ; 192.168.1.104 105.1 IN PTR server05int.xxxx.com. ; 192.168.1.105 106.1 IN PTR server06int.xxxx.com. ; 192.168.1.106 [root@server01 zones]#
Created 02-08-2017 08:31 AM
Hi @
I made the changes on the OS disabling the IPV6 and that seems to have done the trick, thanks so much for the suggestion[root@server02 ~]# vi /etc/sysctl.conf [root@server02 ~]# sysctl -p net.ipv4.tcp_keepalive_time = 300 net.ipv4.ip_local_port_range = 1024 65000 fs.file-max = 64000 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 [root@server02 ~]# systemctl restart network [root@server02 ~]#
Created 02-08-2017 04:05 PM
Great. Glad you got to the bottom of it!