I have had a running 5 node (1 Master) cloudera cluster for the past 1 week and all of a sudden I have lost WEB UI access to Cloudera Manager.
I am able to work on other services like HIVE/HBASE/SPARK/NIFI etc. The status shows that both the cloudera agent and cloudera server are running. I am able to run the Curl command on 7180 on the Master node and I get the output.
Not able to figure out what the problem could be.
@Shelton , I have checked for space.
[root@ip-172-31-24-21 log]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p2 30G 13G 18G 43% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 41M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
cm_processes 16G 4.8M 16G 1% /run/cloudera-scm-agent/process
tmpfs 3.1G 0 3.1G 0% /run/user/1000
Also, the services have been started several times. I am able to listen to the port and am able to run the curl command on the server and it returns a good output.
Regular exception is observed in CM server logs :
2019-10-20 17:32:34,687 ERROR ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloaderImpl: (11 skipped) Unable to retrieve remote parcel repository manifest java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: archive.cloudera.com/126.96.36.199:443
This may happen if you have http_proxy to access public web or you have private network.
Currently CM is trying to access the archive url to download parcels as this was the method used while instaling CM and failing to do so.
Try running below command on CM node and let us know the output :
If you want to set proxy can be done under Administration > Search for 'Proxy'
Below is the error I am getting.
--2019-10-22 19:25:24-- https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Resolving archive.cloudera.com (archive.cloudera.com)... 188.8.131.52
Connecting to archive.cloudera.com (archive.cloudera.com)|184.108.40.206|:443... failed: Connection timed out.
--2019-10-22 19:27:32-- (try: 2) https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json
Connecting to archive.cloudera.com (archive.cloudera.com)|220.127.116.11|:443...
I am able to connect to this server using ssh and I have the same group security setting on AWS for all 5 machines on my cluster. Not sure why the server stopped working without me making any additional changes.