Created 07-15-2016 10:31 AM
I have a cluster of 10 nodes managed with Ambari 2.2.1. The config DB is MySQL and the HDP version is 2.4.0. After space issue on the DB filesystem, i'm unable to restart or stop any service through the UI. Even after the cluster restart, i'm still unable to start services. All requests stuck and i can't abort them. Can someone help me?
Created 07-15-2016 11:31 PM
Please restore a backup of the database to a new larger partition and restart Ambari server. cleanup of db concerns me.
Created 07-15-2016 10:37 AM
space issue on the db. what is that, have you resolved that and then trying to restart?
Created 07-15-2016 07:03 PM
Thank you Deepak. I mean the configuration database was on a small partition disk which got full. So mysql shut down. We made a cleanup and restarted mysql and the Ambani server. After that nothing works normally.
Created 07-15-2016 06:59 PM
Thank you Deepak. I mean the configuration database was on a small partition disk which got full. So mysql shut down. We made a cleanup and restarted mysql and the Ambani server. After that nothing works normally.
Created 07-15-2016 07:14 PM
@Samie WALA can you post what the ambari log is spitting out when you try to restart service?
Created 07-15-2016 08:17 PM
@Sunile Manjee Yes the ambari-server.log file contains a lot of information. Among them, some tables crashed and needed to be repaired. That is what i did first. After taht i'm seeing this java error:
15 Jul 2016 20:11:34,776 WARN [ambari-action-scheduler] ActionScheduler:200 - Exception received java.lang.RuntimeException: Invalid DB state, broken one-to-one relation for taskId=30710 at org.apache.ambari.server.actionmanager.HostRoleCommand.getExecutionCommandWrapper(HostRoleCommand.java:371) at org.apache.ambari.server.actionmanager.Stage.loadExecutionCommandWrappers(Stage.java:216) at org.apache.ambari.server.actionmanager.Stage.checkWrappersLoaded(Stage.java:203) at org.apache.ambari.server.actionmanager.Stage.getExecutionCommands(Stage.java:595) at org.apache.ambari.server.actionmanager.ActionScheduler.isStageHasBackgroundCommandsOnly(ActionScheduler.java:529) at org.apache.ambari.server.actionmanager.ActionScheduler.filterParallelPerHostStages(ActionScheduler.java:485) at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:251) at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:195) at java.lang.Thread.run(Thread.java:745)
Created 07-15-2016 09:14 PM
@Samie WALA I have not seen this before. I need to do a quick dive into the code. until then @Artem Ervits have you seen this?
Created 07-15-2016 11:31 PM
Please restore a backup of the database to a new larger partition and restart Ambari server. cleanup of db concerns me.
Created 07-16-2016 12:26 AM
I finally solved the issue by traking the taskID 30170 in the ambari DB. I found that there is an inconsistancy due to the DB crash. The tables "execution_command" and "host_role_command" were those containing the reference to this taskID while the table "task" has no reference to it. These twotables were those i repaired after the DB crashed.
After cleaning up these tables i was able to restart all services.
But to avoid any other related problem that could occur later, i restored an older ambari DB as @Artem Ervits suggested.
Now all services are running fine.
Thank you all for your help.