Created on 01-05-2017 01:07 PM - edited 09-16-2022 03:53 AM
Is there any tool available like "Chaos Monkey" to use in Ambari cluster setup. I am trying to test the HA. What is the best way to have it tested in my cluster?
Created 01-06-2017 08:20 PM
Good Q. Not explicitly, AFAIK. We do have a integral chaos monkey in Slider (incubating), which you just turn on, give a sleep time and then schedule multiple actions (worker death, AM death).
If you are working with an EC2 cluster, you can just use Netflix's Chaos Monkey lib and have it do the killing.
Otherwise, the general best practise is to have something automated to SSH in and find/kill processes. I don't have any up to date code for this; I used to somewhere but it relates to older linux versions, and has probably aged now. I'm afraid you'll have to look around online for that.
What is really, really slick for testing HA failover is code to turn real/virtual network switches off. This is good as it lets you rigorously test what happens if there's a network partition and everything stays running, just unreachable.
Pro tip: issuing a kill -SIGSTOP is a great way to simulate a hung (as opposed to a failed) process.
Created 01-06-2017 08:20 PM
Good Q. Not explicitly, AFAIK. We do have a integral chaos monkey in Slider (incubating), which you just turn on, give a sleep time and then schedule multiple actions (worker death, AM death).
If you are working with an EC2 cluster, you can just use Netflix's Chaos Monkey lib and have it do the killing.
Otherwise, the general best practise is to have something automated to SSH in and find/kill processes. I don't have any up to date code for this; I used to somewhere but it relates to older linux versions, and has probably aged now. I'm afraid you'll have to look around online for that.
What is really, really slick for testing HA failover is code to turn real/virtual network switches off. This is good as it lets you rigorously test what happens if there's a network partition and everything stays running, just unreachable.
Pro tip: issuing a kill -SIGSTOP is a great way to simulate a hung (as opposed to a failed) process.
Created 01-07-2017 10:44 AM
I am using a our own unix instances on AWS which is not exactly EC2 type. I have installed ambari. Can you please let me know the steps to enable it.
Created 01-09-2017 09:45 AM
Like I said, if you are running on EC2, you should be able to play with Netflix's Chaos Monkey direct. I haven't used it for a while; https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide covers starting it....I think it's got more complex than in the early days, when it was more of a CLI thing