Support Questions

Find answers, ask questions, and share your expertise

YARN resource manager "HTTP request sent, awaiting response..."

avatar

we have Hadoop cluster with active/stand by resource manager services the active resource manager is on master1 machine and the stand by resource manager is on master2 machine

in our cluster YARN service that include both resource manager services is managing 276 node manager component on workers machines

from Ambari WEB UI alerts ( Alerts for Resource Manager ) we notice about the following

Resource Manager Web UI
Connection failed to http://master2.jupiter.com:8088(timed out)


we start to debug the issue by wget with port 8088 , and we found that process is hang on - HTTP request sent, `awaiting response... No data received`.

example from resource manager machine

wget --debug http://master2.jupiter.com:8088
DEBUG output created by Wget 1.14 on Linux-gnu.

URI encoding = ‘UTF-8’
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:13:42-- http://master2` .jupiter.com:8088/
Resolving master2.jupiter.com (master2.jupiter.com)... 192.9.201.169
Caching master2.jupiter.com => 192.9.201.169
Connecting to master2.jupiter.com (master2.jupiter.com)|192.9.201.169|:8088... connected.
Created socket 3.
Releasing 0x0000000000a0da00 (new refcount 1).

---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: master2.jupiter.com:8088
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 21 Feb 2024 10:13:42 GMT
Date: Wed, 21 Feb 2024 10:13:42 GMT
Pragma: no-cache
Expires: Wed, 21 Feb 2024 10:13:42 GMT
Date: Wed, 21 Feb 2024 10:13:42 GMT
Pragma: no-cache
Content-Type: text/plain; charset=UTF-8
X-Frame-Options: SAMEORIGIN
Location: http://master1.jupiter.com:8088/
Content-Length: 43
Server: Jetty(6.1.26.hwx)

---response end---
307 TEMPORARY_REDIRECT
Registered socket 3 for persistent reuse.
URI content encoding = ‘UTF-8’
Location: http://master1.jupiter.com:8088/ [following]
Skipping 43 bytes of body: [This is standby RM. The redirect url is: /
] done.
URI content encoding = None
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:13:42-- http://master1.jupiter.com:8088/
conaddr is: 192.9.201.169
Resolving master1.jupiter.com (master1.jupiter.com)... 192.9.66.14
Caching master1.jupiter.com => 192.9.66.14
Releasing 0x0000000000a0f320 (new refcount 1).
Found master1.jupiter.com in host_name_addresses_map (0xa0f320)
Connecting to master1.jupiter.com (master1.jupiter.com)|192.9.66.14|:8088... connected.
Created socket 4.
Releasing 0x0000000000a0f320 (new refcount 1).
.
.
.

---response end---
302 Found
Disabling further reuse of socket 3.
Closed fd 3
Registered socket 4 for persistent reuse.
URI content encoding = ‘UTF-8’
Location: http://master1.jupiter.com:8088/cluster [following]
] done.
URI content encoding = None
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
Converted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8)
--2024-02-21 10:27:07-- http://master1.jupiter.com:8088/cluster
Reusing existing connection to master1.jupiter.com:8088.
Reusing fd 4.

---request begin---
GET /cluster HTTP/1.1
User-Agent: Wget/1.14 (linux-gnu)
Accept: */*
Host: master1.jupiter.com:8088
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Wed, 21 Feb 2024 10:30:23 GMT
Date: Wed, 21 Feb 2024 10:30:23 GMT
Pragma: no-cache
Expires: Wed, 21 Feb 2024 10:30:23 GMT
Date: Wed, 21 Feb 2024 10:30:23 GMT
Pragma: no-cache
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
Server: Jetty(6.1.26.hwx)

---response end---
200 OK
URI content encoding = ‘utf-8’
Length: unspecified [text/html]
Saving to: ‘index.html’

[ <=> ] 1,018,917 --.-K/s in 0.04s

2024-02-21 10:31:31 (24.0 MB/s) - ‘index.html’ saved [1018917]


as we can see above wget completed after very long time around ~ 20 min instead to completed the process in one or two second


we can take tcpdump as

tcpdump -vv -s0 tcp port 8088 -w /tmp/why_8088_hang.pcap


but I want to understand if there are better simple ways to understand why we get HTTP request sent, awaiting response... , and maybe its related to resource manager service

Michael-Bronson
1 REPLY 1

avatar
Expert Contributor

Hi @mike_bronson7 
Please check that if the RM is working fine or is it down. Please also check your zookeeper.

Please check with telnet if you can connect to RMhost from other host.