Multiple files ownership on cluster got changed to root accidentally. On restarting the services cloudera-scm-server-db, cloudera-scm-server, cloudera-scm-agent all the service starts gracefully. I am also able to login to cloudera manager console but no services seems to be working, even the cloudera management service fails to restart.
I found there is no files created in /var/run/cloudera-scm-agent/process/ on namenode.
Could we please get a list of files or folders that needs user to be cloudera-scm or other users except root so that we can start our cluster by restoring back the ownership.
It sounds as if you don't know what files got changed. Unless you were running the cluster in single-user mode, the agents will creat the files in /var/run/cloudera-scm-agent/process and the angents run as root. Permissions won't be an issue there.
So, let's step back and find out what the problem is. You open up Cloudera Manager and you see what? Are there question marks near all services? What happens whne you try to restar the Management Service? You'll want to check the following:
- stdout, stderr, role log to look for clues (in the command status when you are starting the service.
- the agent log from the agent on the host where the management service runs. Check /var/log/cloudera-scm-agent/cloudera-scm-agent.log for messages when starting the roles.
Many causes are possible in this scenario, so we should focus on what is causing the start failure.
That said, when the roles attempt to start, they will try to write to their respective log directories. Make sure that /var/log/cloudera-scm* is owned by "cloudera-scm:cloudera-scm" or the log files won't be writeable and that could cause a startup failure.
Also make sure /var/lib/cloudera-* is also owned by "cloudera-scm:cloudera-scm"
We really appreciate your time and effort for the help.
The issue was files permissions got changed to root including all logs and services. Cloudera Sever was running I was able to login but none of the services were running. When i tried to restart the managemnet services it wasnt even able to stop the services. I rebooted a few times and it was not working , aghast i started changing user to cloudera scm to logs and related files hoping it would work, It didnt. I also took reference of another cluster and restored all permissions as much i could find but still it didnt worked.
Later I uninstalled cloudera manager taking backup of namespace and edit logs as i had no issues with datanodes.
On reinstalling i found there was issue with open jdk as i monitored carefully agent logs. I uninstalled open jdk. Then after reinstalling cloudera manager, my cluster was up and running i replaced back my namespace and it worked as I had all my metadata saved in mysql.
Luckily my case got resolved, hope this will help others if they get in some similar trouble.
can you specify the steps how to identify open jdk and remove it