Created on
10-21-2019
09:07 PM
- last edited on
10-21-2019
10:30 PM
by
ask_bill_brooks
I have had a running 5 node (1 Master) cloudera cluster for the past 1 week and all of a sudden I have lost WEB UI access to Cloudera Manager.
I am able to work on other services like HIVE/HBASE/SPARK/NIFI etc. The status shows that both the cloudera agent and cloudera server are running. I am able to run the Curl command on 7180 on the Master node and I get the output.
Not able to figure out what the problem could be.
Created on 10-21-2019 10:33 PM - edited 10-21-2019 10:34 PM
Can you share your CM logs? Some usual checklist is to check the file system status
$ df -h
And try to restart thr CM manager and the agents but most important check and share the agent logs too.
Created 10-22-2019 12:23 AM
@Shelton , I have checked for space.
[root@ip-172-31-24-21 log]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 30G 13G 18G 43% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 41M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
cm_processes 16G 4.8M 16G 1% /run/cloudera-scm-agent/process
tmpfs 3.1G 0 3.1G 0% /run/user/1000
Also, the services have been started several times. I am able to listen to the port and am able to run the curl command on the server and it returns a good output.
<attachment>
Created 10-22-2019 12:23 AM
Created 10-22-2019 04:56 AM
Created 10-22-2019 08:13 AM
Created on 10-22-2019 09:38 AM - edited 10-22-2019 09:44 AM
Regular exception is observed in CM server logs :
2019-10-20 17:32:34,687 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (11 skipped) Unable to retrieve remote parcel repository manifest
java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: archive.cloudera.com/151.101.188.167:443
This may happen if you have http_proxy to access public web or you have private network.
Currently CM is trying to access the archive url to download parcels as this was the method used while instaling CM and failing to do so.
Try running below command on CM node and let us know the output :
wget https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
If you want to set proxy can be done under Administration > Search for 'Proxy'
Created 10-22-2019 10:32 AM
As reiterated by @ssulav your problem is emanating from a network access problem to this server
http://ip-172-31-24-21.us-west-1.compute.internal
Resolve that and revert
Created 10-22-2019 12:31 PM
Below is the error I am getting.
wget https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
--2019-10-22 19:25:24-- https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.40.167
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.40.167|:443... failed: Connection timed out.
Retrying.
--2019-10-22 19:27:32-- (try: 2) https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.40.167|:443...
I am able to connect to this server using ssh and I have the same group security setting on AWS for all 5 machines on my cluster. Not sure why the server stopped working without me making any additional changes.
Created 10-22-2019 12:59 PM
Created 10-22-2019 01:12 PM
There is no firewall installed on the EC2 instance and the security group rules (that are common for all nodes) are configured to allow all traffic from my IP.
Created 10-22-2019 02:15 PM
Hi Axe,
You mentioned that ". I am able to run the Curl command on 7180 on the Master node and I get the output. "
When you said you get the output, do you mean the curl response content is the correct CM web UI content? If yes, it means CM server itself works fine with the Web UI part. Can you try to curl to Master node 7180 from a different machine, including the machine you are trying to connect using web browser and see if you get the correct response.
Created 10-22-2019 02:59 PM
Curl Response from Cloudera Manager Host.
[root@ip-172-31-24-21 etc]# curl -u admin:admin http://ec2-18-144-47-252.us-west-1.compute.amazonaws.com:7180 -v
* About to connect() to ec2-18-144-47-252.us-west-1.compute.amazonaws.com port 7180 (#0)
* Trying 172.31.24.21...
* Connected to ec2-18-144-47-252.us-west-1.compute.amazonaws.com (172.31.24.21) port 7180 (#0)
* Server auth using Basic with user 'admin'
> GET / HTTP/1.1
> Authorization: Basic YWRtaW46YWRtaW4=
> User-Agent: curl/7.29.0
> Host: ec2-18-144-47-252.us-west-1.compute.amazonaws.com:7180
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Tue, 22 Oct 2019 21:55:52 GMT
< Set-Cookie: CLOUDERA_MANAGER_SESSIONID=node0124m6i5b2ijzlyafpyc8wm6u724.node0;Path=/;HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Last-Modified: Fri, 19 Jul 2019 06:26:10 GMT
< Content-Type: text/html;charset=utf-8
< Accept-Ranges: bytes
< Cache-Control: max-age=3600,public
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< Content-Length: 63
<
<head><meta http-equiv="refresh" content="0;url=/cmf/"></head>
* Connection #0 to host ec2-18-144-47-252.us-west-1.compute.amazonaws.com left intact
Curl Response from some other machine.
* Rebuilt URL to: http://ec2-18-144-47-252.us-west-1.compute.amazonaws.com:7180/
* Trying 18.144.47.252...
* TCP_NODELAY set
* Connection failed
* connect to 18.144.47.252 port 7180 failed: Operation timed out
* Failed to connect to ec2-18-144-47-252.us-west-1.compute.amazonaws.com port 7180: Operation timed out
* Closing connection 0
curl: (7) Failed to connect to ec2-18-144-47-252.us-west-1.compute.amazonaws.com port 7180: Operation timed out
I have all the necessary ports open on the CM Host machine for HTTP/HTTPS and also the 7180 port.
Created 10-22-2019 03:06 PM
It looks like a network relevant configuration issue. Can you please run below command on CM server host:
netstat -lnpt | grep 7180
Created 10-22-2019 03:10 PM
Find the output below.
[root@ip-172-31-24-21 etc]# netstat -lnpt | grep 7180
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 1250/java
Created 10-22-2019 03:18 PM
Are you even able to ping this host from your local machine? Are you able to telnet to other port from your local machine?
Created 10-22-2019 04:01 PM
I am able to ping the server but not telnet it.
But I am able to connect to the server using scp and telnet is working on port 22.
Created 10-22-2019 05:49 PM
Hi Axe,
As you are able to curl port 7180 from the CM server host successfully, but not from other machines, this proves that CM server port 7180 is listening and working fine. You need to work with Network team or AWS team to investigate further regarding network relevant issue I believe. for example iptables or firewalld configuration. Please let us know if you have any progress.
Created 10-22-2019 05:52 PM
Another thing worth trying is to capture network traffic on CM server host for port 7180 (you can use tcpdump to do this for example), and check when you request to access port 7180 from your web browser, does CM server host even receive the request or not.
Created 10-22-2019 08:11 PM
I would suggest you to go through the below docs and verify the outbound rules on port 7180.
https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html