Support Questions

koffitse · ‎07-15-2016

I have a cluster of 10 nodes managed with Ambari 2.2.1. The config DB is MySQL and the HDP version is 2.4.0. After space issue on the DB filesystem, i'm unable to restart or stop any service through the UI. Even after the cluster restart, i'm still unable to start services. All requests stuck and i can't abort them. Can someone help me?

aervits · ‎07-15-2016

Please restore a backup of the database to a new larger partition and restart Ambari server. cleanup of db concerns me.

View solution in original post

dsharma · ‎07-15-2016

space issue on the db. what is that, have you resolved that and then trying to restart?

koffitse · ‎07-15-2016

Thank you Deepak. I mean the configuration database was on a small partition disk which got full. So mysql shut down. We made a cleanup and restarted mysql and the Ambani server. After that nothing works normally.

koffitse · ‎07-15-2016

Thank you Deepak. I mean the configuration database was on a small partition disk which got full. So mysql shut down. We made a cleanup and restarted mysql and the Ambani server. After that nothing works normally.

sunile_manjee · ‎07-15-2016

@Samie WALA can you post what the ambari log is spitting out when you try to restart service?

koffitse · ‎07-15-2016

@Sunile Manjee Yes the ambari-server.log file contains a lot of information. Among them, some tables crashed and needed to be repaired. That is what i did first. After taht i'm seeing this java error:

15 Jul 2016 20:11:34,776  WARN [ambari-action-scheduler] ActionScheduler:200 - Exception received
java.lang.RuntimeException: Invalid DB state, broken one-to-one relation for taskId=30710
	at org.apache.ambari.server.actionmanager.HostRoleCommand.getExecutionCommandWrapper(HostRoleCommand.java:371)
	at org.apache.ambari.server.actionmanager.Stage.loadExecutionCommandWrappers(Stage.java:216)
	at org.apache.ambari.server.actionmanager.Stage.checkWrappersLoaded(Stage.java:203)
	at org.apache.ambari.server.actionmanager.Stage.getExecutionCommands(Stage.java:595)
	at org.apache.ambari.server.actionmanager.ActionScheduler.isStageHasBackgroundCommandsOnly(ActionScheduler.java:529)
	at org.apache.ambari.server.actionmanager.ActionScheduler.filterParallelPerHostStages(ActionScheduler.java:485)
	at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:251)
	at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:195)
	at java.lang.Thread.run(Thread.java:745)

sunile_manjee · ‎07-15-2016

@Samie WALA I have not seen this before. I need to do a quick dive into the code. until then @Artem Ervits have you seen this?

aervits · ‎07-15-2016

Please restore a backup of the database to a new larger partition and restart Ambari server. cleanup of db concerns me.

koffitse · ‎07-16-2016

I finally solved the issue by traking the taskID 30170 in the ambari DB. I found that there is an inconsistancy due to the DB crash. The tables "execution_command" and "host_role_command" were those containing the reference to this taskID while the table "task" has no reference to it. These twotables were those i repaired after the DB crashed.

After cleaning up these tables i was able to restart all services.

But to avoid any other related problem that could occur later, i restored an older ambari DB as @Artem Ervits suggested.

Now all services are running fine.

Thank you all for your help.

Cloudera Community

Support Questions

Unable to start services through AMBARI UI