Received frequent alerts for connection timeout with journal node. Upon checking the connectivity, it results in below output.
curl -v http://123.example.com:8480--max-time 4 | tail -4
* About to connect() to 123.example.com:8480 port 8480 (#0)
* Trying 10.24.16.11... connected * Connected to 123.example.com (10.24.16.11) port 8480 (#0)
> GET / HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: 123.example.com:8480 > Accept: */*
> % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Operation timed out after 4000 milliseconds with 0 bytes received 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0* Closing connection #0
curl: (28) Operation timed out after 4000 milliseconds with 0 bytes received
Checking netstat command further for port 8480 gives us huge number of CLOSE_WAIT messages.
It was found that an edits_in_progress file was stuck as an orphan file since last two months while the edits recorded in it are already captured in other completed edits file. Due to this, the port 8480 of the respective Journal node process was coming up in CLOSE_WAIT as the socket is not closed properly.
SOLUTION :
Removed the orphan edits_in_progress file and restarted journal nodes.