Community Articles

gsharma · ‎01-19-2017

DESCRIPTION:

Received frequent alerts for connection timeout with journal node. Upon checking the connectivity, it results in below output.

curl -v http://123.example.com:8480--max-time 4 | tail -4 
* About to connect() to  123.example.com:8480  port 8480 (#0) 
* Trying 10.24.16.11... connected * Connected to 123.example.com  (10.24.16.11) port 8480 (#0) 
> GET / HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2 
> Host: 123.example.com:8480 > Accept: */* 
> % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Operation timed out after 4000 milliseconds with 0 bytes received 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Closing connection #0 
curl: (28) Operation timed out after 4000 milliseconds with 0 bytes received

Checking netstat command further for port 8480 gives us huge number of CLOSE_WAIT messages.

[root@123 ~]# netstat -putane | grep -i 8480 
tcp 0 0 0.0.0.0:8480 0.0.0.0:* LISTEN 72383 1586576877 1719/java 
tcp 1 0 10.24.16.11:8480 10.24.17.11:46572 CLOSE_WAIT 72383 1587407492 1719/java 
tcp 1 0 10.24.16.11:8480 10.24.17.11:57944 CLOSE_WAIT 72383 1586744345 1719/java 
tcp 1 0 10.24.16.11:8480 10.24.17.11:57462 CLOSE_WAIT 72383 1586708412 1719/java

Check the meaning of CLOSE_WAIT here link

ROOT CAUSE:

It was found that an edits_in_progress file was stuck as an orphan file since last two months while the edits recorded in it are already captured in other completed edits file. Due to this, the port 8480 of the respective Journal node process was coming up in CLOSE_WAIT as the socket is not closed properly.

SOLUTION :

Removed the orphan edits_in_progress file and restarted journal nodes.

spanchan · ‎06-20-2018

This worked.

Cloudera Community

Community Articles

Frequent journal node connection timeout alerts

Apache Hadoop

Re: Frequent journal node connection timeout alerts