Created 01-28-2016 06:53 PM
Hi,
I'm trying to install Hive on a remote serve using Ambari installation tool. I'm using Ambari 2.2 and installed HDP 2.3.0 with the following components:
I got the following error with check Hive:
stderr: /var/lib/ambari-agent/data/errors-64.txtTraceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module> HiveServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check (params.hostname, params.hive_server_port, elapsed_time)) resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds stderr: /var/lib/ambari-agent/data/errors-64.txt All ports on the remote server are open. When I use netstat -tupln | grep -i listen | grep -i 10000 There was no process listening on that port. I've retried the installation and the same error happened again. Can anyone help on how to fix it? Thanks,
Created 01-28-2016 08:38 PM
@Jade Liu can you run tracert to spark01 server, if its an option also disable firewall and try again. It's hard to tell but you may be denying that server.
Created 01-28-2016 06:55 PM
Try just, netstat -tupln | grep -i 10000
Any firewall rules?
Created 01-28-2016 07:56 PM
Thanks for your reply!
netstat -tupln | grep -i 10000
did not get any result.
To check firewall settings:
run iptables -L, get the following result:
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
DROP icmp -- anywhere anywhere icmp timestamp-request
DROP icmp -- anywhere anywhere icmp timestamp-reply
DROP icmp -- anywhere anywhere icmp address-mask-request
ACCEPT icmp -- anywhere anywhere icmp any
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
ACCEPT tcp -- 10.40.19.4 anywhere tcp dpt:5666
ACCEPT tcp -- dbbkup02.nor1solutions.com anywhere tcp dpt:5666
ACCEPT tcp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere tcp dpt:5666
ACCEPT tcp -- nagios-dev.nor1sc.net anywhere tcp dpt:5666
ACCEPT udp -- 10.40.19.4 anywhere udp dpt:snmp
ACCEPT udp -- dbbkup02.nor1solutions.com anywhere udp dpt:snmp
ACCEPT udp -- 32.c2.9bc0.ip4.static.sl-reverse.com anywhere udp dpt:snmp
ACCEPT tcp -- 10.0.0.0/8 anywhere state NEW tcp dpt:ssh
ACCEPT tcp -- 209.119.28.98 anywhere state NEW tcp dpt:ssh
DROP all -- anywhere anywhere
I'm using centos6.7.
Created 01-28-2016 08:38 PM
@Jade Liu can you run tracert to spark01 server, if its an option also disable firewall and try again. It's hard to tell but you may be denying that server.
Created 01-28-2016 10:23 PM
is fw down on both nodes?
Created 01-28-2016 08:42 PM
Thanks for replying! Here is the content inside /var/lib/ambari-agent/data/errors-64.txt:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 106, in <module>
HiveServiceCheck().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/service_check.py", line 97, in service_check
(params.hostname, params.hive_server_port, elapsed_time))
resource_management.core.exceptions.Fail: Connection to Hive server spark01.nor1solutions.com on port 10000 failed after 295 seconds
Created 01-28-2016 10:08 PM
Thanks @Artem Ervits for your help!
run traceroute -p 10000 spark01.nor1solutions.com
get the following result: traceroute to spark01.nor1solutions.com (10.86.36.14), 64 hops max, 52 byte packets
the routes look normal.
Then tried to disable the firewall using service iptables stop on the server and seems like Hive now is successfully started. But now when I try to run any hive question, I still getting the following error:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: java.sql.SQLException: Connections could not be acquired from the underlying database! Error Code: 0
H020 Could not establish connecton to spark01.nor1solutions.com:10000: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
Is there any other thing blocking the connection to port 10000?
Created 01-28-2016 10:22 PM
@Jade Liu Try to restart Hive with fw down. You need to make sure every endpoint allows every other node
Created 01-28-2016 11:58 PM
Thank you @Artem Ervits! If I cannot turn off fw, is there any other option to fix the connection problem?
Created 01-29-2016 12:01 AM
@Jade Liu you need to add the server to firewall policy. You can allow all traffic to that server in firewall policy or go through each Port and add them.