Created on 02-28-2017 01:28 AM - edited 09-16-2022 04:10 AM
Hi I have a problem with my Cloudera cluster which I have not touched for several weeks. It consist of 1 master node and 6 slave nodes(7 VMs). I decided to run yum update and restart on all the nodes. After restarting, I noticed the folowing things:
Symptoms:
- Unable to access Hue at localhost:8888
- Unable to restart cluster via Cloudera Manager(7180)
- Unable to restart Cloudera Management Service
- Cloudera Manager looks like this
Cloudera Manager Screenshot
Steps taken:
- Restart cloudera-scm-server and check status = active(exited)
- Restart cloudera-scm-agent and check status = active(exited)
- Restart cloudera-scm-server-db and check status = server is running
- And various other steps which I forgotten to note down
Please help. I have been troubleshooting this problem for the past day.
Let me know which logs I should attach to provide more information for you guys.
Note: Last resort is to set up everything again. But I have very important codes in my Hue Notebook. Does anyone know the location where Hue Notebooks are stored?
Created on 03-01-2017 12:14 AM - edited 03-01-2017 12:16 AM
Hey guys! Great news! Problem finally solved after 1.5 days of troubleshooting!
While digging through the error logs, I saw an error message somewhere in the Cloudera Agent log saying "ValueError: too many values to unpack". Then I search for solutions online to solve that problem.
In conclusion, the errors were caused by
1) System time synchronization was disabled. All nodes's system time was not in sync.
2) Latest Java OpenJDK update broke Cloudera Agent.
Solution:
1) Enabling system time syncronization(As suggested by @saranvisa)
service ntpd start
2) Uninstalling OpenJDK on each node.
rpm -qa | grep jdk
yum remove <each item from the previous step>
3) Run "Re-run upgrade Wizard" in Cloudera Manager and wait for Inspect Host to finish. Done!
Thanks so much for the help guys!
Reference:
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-cloudera-agent/td-p/476...
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Mismatched-CDH-versions-host-has-NON...
Created 02-28-2017 04:27 AM
did u restart your machine while updating ?
can yo share here logs for cloudera manager ?
Created 02-28-2017 04:43 AM
Hi there. Nope. I only restarted my machine after finish running the yum update command.
Can you tell me which logs do you need?
var/lib/cloudera-scm-server/?
var/lib/cloudera-scm-agent/?
var/lib/cloudera-scm-server-db/?
Created 02-28-2017 05:10 AM
Created on 02-28-2017 07:55 PM - edited 02-28-2017 07:59 PM
Took me some time to copy them online. Here you go, @Dilshad
From Master node:
var/log/cloudera-scm-server/
http://pastebin.com/5AiCP7Wm
var/lib/cloudera-scm-server-db/data/pg_log
From Slave node:
var/log/cloudera-scm-server-agent/
Created 02-28-2017 07:11 AM
Go to Cloudera Manager -> Hosts , check the Host status and understand what kind of issue it shows
Also login as root in linux and run the below command
service ntpd status
service ntpd start
service ntpd status
restart the CM and try again
Created on 02-28-2017 07:58 PM - edited 02-28-2017 08:00 PM
Hi there @saranvisa ! I have checked the Host status. It shows "Unknown Health".
As for your 2nd suggestion, I have checked on all the nodes. service ntpd was not running for some reason(it was running before this) and have restarted them and restarted the CM. There was no difference.
Created 02-28-2017 08:46 PM
Go to CM -> Hosts -> Click on each hosts -> Health history (left down) -> share me the details
Created 02-28-2017 09:59 PM
@saranvisa
It shows this message in the Health Historty as shown in the screenshot attached below.
"The Event Server is currently unavailable. View the status of the Event Server"
Created on 03-01-2017 12:14 AM - edited 03-01-2017 12:16 AM
Hey guys! Great news! Problem finally solved after 1.5 days of troubleshooting!
While digging through the error logs, I saw an error message somewhere in the Cloudera Agent log saying "ValueError: too many values to unpack". Then I search for solutions online to solve that problem.
In conclusion, the errors were caused by
1) System time synchronization was disabled. All nodes's system time was not in sync.
2) Latest Java OpenJDK update broke Cloudera Agent.
Solution:
1) Enabling system time syncronization(As suggested by @saranvisa)
service ntpd start
2) Uninstalling OpenJDK on each node.
rpm -qa | grep jdk
yum remove <each item from the previous step>
3) Run "Re-run upgrade Wizard" in Cloudera Manager and wait for Inspect Host to finish. Done!
Thanks so much for the help guys!
Reference:
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-cloudera-agent/td-p/476...
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Mismatched-CDH-versions-host-has-NON...