Created on 10-21-2019 09:07 PM - last edited on 10-21-2019 10:30 PM by ask_bill_brooks
I have had a running 5 node (1 Master) cloudera cluster for the past 1 week and all of a sudden I have lost WEB UI access to Cloudera Manager.
I am able to work on other services like HIVE/HBASE/SPARK/NIFI etc. The status shows that both the cloudera agent and cloudera server are running. I am able to run the Curl command on 7180 on the Master node and I get the output.
Not able to figure out what the problem could be.
Created on 10-21-2019 10:33 PM - edited 10-21-2019 10:34 PM
Can you share your CM logs? Some usual checklist is to check the file system status
$ df -h
And try to restart thr CM manager and the agents but most important check and share the agent logs too.
Created 10-22-2019 12:23 AM
@Shelton , I have checked for space.
[root@ip-172-31-24-21 log]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 30G 13G 18G 43% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 41M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
cm_processes 16G 4.8M 16G 1% /run/cloudera-scm-agent/process
tmpfs 3.1G 0 3.1G 0% /run/user/1000
Also, the services have been started several times. I am able to listen to the port and am able to run the curl command on the server and it returns a good output.
<attachment>
Created 10-22-2019 12:23 AM
Created 10-22-2019 04:56 AM
Created 10-22-2019 08:13 AM
Created on 10-22-2019 09:38 AM - edited 10-22-2019 09:44 AM
Regular exception is observed in CM server logs :
2019-10-20 17:32:34,687 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (11 skipped) Unable to retrieve remote parcel repository manifest
java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: archive.cloudera.com/151.101.188.167:443
This may happen if you have http_proxy to access public web or you have private network.
Currently CM is trying to access the archive url to download parcels as this was the method used while instaling CM and failing to do so.
Try running below command on CM node and let us know the output :
wget https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
If you want to set proxy can be done under Administration > Search for 'Proxy'
Created 10-22-2019 10:32 AM
As reiterated by @ssulav your problem is emanating from a network access problem to this server
http://ip-172-31-24-21.us-west-1.compute.internal
Resolve that and revert
Created 10-22-2019 12:31 PM
Below is the error I am getting.
wget https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
--2019-10-22 19:25:24-- https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Resolving archive.cloudera.com (archive.cloudera.com)... 151.101.40.167
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.40.167|:443... failed: Connection timed out.
Retrying.
--2019-10-22 19:27:32-- (try: 2) https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Connecting to archive.cloudera.com (archive.cloudera.com)|151.101.40.167|:443...
I am able to connect to this server using ssh and I have the same group security setting on AWS for all 5 machines on my cluster. Not sure why the server stopped working without me making any additional changes.
Created 10-22-2019 12:59 PM