Support Questions

cjervis · ‎03-24-2020

We are seeing lot of no route to host in datanode logs and impala queries are also failing due to this. We are seeing this within the nodes and between nodes also. Issue is happening with multiple nodes. Host inspector are running with no issues.

We did lot of checks with OS and network team we couldn't find any. Any help on this.

1004:DataXceiver error processing WRITE_BLOCK operation src: /192.168.225.165:55010 dst: /192.168.225.165:1004
java.net.NoRouteToHostException: No route to host

1004:DataXceiver error processing WRITE_BLOCK operation src: /192.168.225.68:35322 dst: /192.168.225.68:1004
java.net.NoRouteToHostException: No route to host

1004:DataXceiver error processing WRITE_BLOCK operation src: /192.168.225.171:40718 dst: /192.168.225.165:1004
java.net.NoRouteToHostException: No route to host

Shelton · ‎03-24-2020

@npdell

"No route to host" Signals that an error occurred while attempting to connect a socket to a remote address and port. Typically, the remote host cannot be reached because of an intervening firewall, or if an intermediate router is down.

If you are not using static IP's can you check on hosts 192.168.225.165,192.168.225.68 and 192.168.225.171 that their IP's haven't changed by just running

$ ifconfig

The output should match the IP's in the /etc/hosts table

Please do that and revert

npdell · ‎03-24-2020

@Shelton Thanks for responding. Why would the same error comes for communication within the host also for different ports? any clues.

We are using static IP (private IP for cluster communication) and it is specified /etc/hosts across all hosts.

Shelton · ‎03-24-2020

@npdell

Curious have you checked the firewalls are off and most important that these first 2 lines are present in your/etc/hosts files on each host.

Please uncomment them if they are commented out!

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# Your host entry below here #
192.168.225.68     [FQDN]        [ALIAS]

Maybe also try to ping and sshing from one host to another

Please revert

npdell · ‎03-24-2020

We do have below entries and we have confirmed there are no firewall rules. One more thing i missed to mention we started seeing this issues when we started upgrading OS in the cluster nodes. From OEL 6.x to OEL 7.x.

But this seems to be happening on both type of host also looking at the logs.

127.0.0.1 localhost.localdomain localhost
# special IPv6 addresses
::1 localhost6.localdomain6 localhost6

fe00::0 ipv6-localnet

ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts

Shelton · ‎03-25-2020

@npdell

Before starting the upgrade did you by any change validate the upgrade path with Cloudera supportmatrix this should have been your first reference source.

npdell · ‎03-26-2020

Yes all were done.

asankaran · ‎08-17-2020

@npdell curious to know if you have fixed the above issue and what you did.