Support Questions

Find answers, ask questions, and share your expertise

Cloudera Manager services not responding after running yum update on CentOS

avatar
Explorer

Hi I have a problem with my Cloudera cluster which I have not touched for several weeks. It consist of 1 master node and 6 slave nodes(7 VMs). I decided to run yum update  and restart on all the nodes. After restarting, I noticed the folowing things:


Symptoms:
- Unable to access Hue at localhost:8888
- Unable to restart cluster via Cloudera Manager(7180)

- Unable to restart Cloudera Management Service
- Cloudera Manager looks like this

Cloudera Manager Screenshot

Steps taken:
- Restart cloudera-scm-server and check status = active(exited)
- Restart cloudera-scm-agent and check status = active(exited)
- Restart cloudera-scm-server-db and check status = server is running

- And various other steps which I forgotten to note down


Please help. I have been troubleshooting this problem for the past day.
Let me know which logs I should attach to provide more information for you guys.

Note: Last resort is to set up everything again. But I have very important codes in my Hue Notebook. Does anyone know the location where Hue Notebooks are stored?

1 ACCEPTED SOLUTION

avatar
Explorer

Hey guys! Great news! Problem finally solved after 1.5 days of troubleshooting!

While digging through the error logs, I saw an error message somewhere in the Cloudera Agent log saying "ValueError: too many values to unpack". Then I search for solutions online to solve that problem.

In conclusion, the errors were caused by
1) System time synchronization was disabled. All nodes's system time was not in sync.
2) Latest Java OpenJDK update broke Cloudera Agent.

Solution:

1) Enabling system time syncronization(As suggested by @saranvisa)

service ntpd start

2) Uninstalling OpenJDK on each node.

rpm -qa | grep jdk

yum remove <each item from the previous step>

3) Run "Re-run upgrade Wizard" in Cloudera Manager and wait for Inspect Host to finish. Done!

Thanks so much for the help guys!


Reference:
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-cloudera-agent/td-p/476...
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Mismatched-CDH-versions-host-has-NON...

View solution in original post

9 REPLIES 9

avatar
Explorer

did u restart your machine while updating ?

can yo share here logs for cloudera manager ?

avatar
Explorer

Hi there. Nope. I only restarted my machine after finish running the yum update command.
Can you tell me which logs do you need?
var/lib/cloudera-scm-server/?

var/lib/cloudera-scm-agent/?

var/lib/cloudera-scm-server-db/?

avatar
Explorer
all check for any exception while starting
var/lib/cloudera-scm-server/
var/lib/cloudera-scm-agent/
var/lib/cloudera-scm-server-db/

avatar
Explorer

Took me some time to copy them online. Here you go, @Dilshad

 

From Master node:
var/log/cloudera-scm-server/
http://pastebin.com/5AiCP7Wm

var/lib/cloudera-scm-server-db/data/pg_log

http://pastebin.com/L5S71JRx


From Slave node:
var/log/cloudera-scm-server-agent/

http://pastebin.com/k8LMAGG8

avatar
Champion

@codenchips

 

Go to Cloudera Manager -> Hosts , check the Host status and understand what kind of issue it shows

 

Also login as root in linux and run the below command

service ntpd status

service ntpd start

service ntpd status

 

restart the CM and try again

avatar
Explorer

Hi there @saranvisa ! I have checked the Host status. It shows "Unknown Health".
As for your 2nd suggestion, I have checked on all the nodes. service ntpd was not running for some reason(it was running before this) and have restarted them and restarted the CM. There was no difference. 

avatar
Champion

@codenchips

 

Go to CM -> Hosts -> Click on each hosts -> Health history (left down) -> share me the details

avatar
Explorer

@saranvisa 
It shows this message in the Health Historty as shown in the screenshot attached below.

"The Event Server is currently unavailable. View the status of the Event Server"

http://imgur.com/a/8IaSJ

avatar
Explorer

Hey guys! Great news! Problem finally solved after 1.5 days of troubleshooting!

While digging through the error logs, I saw an error message somewhere in the Cloudera Agent log saying "ValueError: too many values to unpack". Then I search for solutions online to solve that problem.

In conclusion, the errors were caused by
1) System time synchronization was disabled. All nodes's system time was not in sync.
2) Latest Java OpenJDK update broke Cloudera Agent.

Solution:

1) Enabling system time syncronization(As suggested by @saranvisa)

service ntpd start

2) Uninstalling OpenJDK on each node.

rpm -qa | grep jdk

yum remove <each item from the previous step>

3) Run "Re-run upgrade Wizard" in Cloudera Manager and wait for Inspect Host to finish. Done!

Thanks so much for the help guys!


Reference:
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-cloudera-agent/td-p/476...
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Mismatched-CDH-versions-host-has-NON...