Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use Chaos Monkey in Ambari cluster setup?

avatar
Rising Star

Is there any tool available like "Chaos Monkey" to use in Ambari cluster setup. I am trying to test the HA. What is the best way to have it tested in my cluster?

1 ACCEPTED SOLUTION

avatar

Good Q. Not explicitly, AFAIK. We do have a integral chaos monkey in Slider (incubating), which you just turn on, give a sleep time and then schedule multiple actions (worker death, AM death).

If you are working with an EC2 cluster, you can just use Netflix's Chaos Monkey lib and have it do the killing.

Otherwise, the general best practise is to have something automated to SSH in and find/kill processes. I don't have any up to date code for this; I used to somewhere but it relates to older linux versions, and has probably aged now. I'm afraid you'll have to look around online for that.

What is really, really slick for testing HA failover is code to turn real/virtual network switches off. This is good as it lets you rigorously test what happens if there's a network partition and everything stays running, just unreachable.

Pro tip: issuing a kill -SIGSTOP is a great way to simulate a hung (as opposed to a failed) process.

View solution in original post

3 REPLIES 3

avatar

Good Q. Not explicitly, AFAIK. We do have a integral chaos monkey in Slider (incubating), which you just turn on, give a sleep time and then schedule multiple actions (worker death, AM death).

If you are working with an EC2 cluster, you can just use Netflix's Chaos Monkey lib and have it do the killing.

Otherwise, the general best practise is to have something automated to SSH in and find/kill processes. I don't have any up to date code for this; I used to somewhere but it relates to older linux versions, and has probably aged now. I'm afraid you'll have to look around online for that.

What is really, really slick for testing HA failover is code to turn real/virtual network switches off. This is good as it lets you rigorously test what happens if there's a network partition and everything stays running, just unreachable.

Pro tip: issuing a kill -SIGSTOP is a great way to simulate a hung (as opposed to a failed) process.

avatar
Rising Star

I am using a our own unix instances on AWS which is not exactly EC2 type. I have installed ambari. Can you please let me know the steps to enable it.

avatar

Like I said, if you are running on EC2, you should be able to play with Netflix's Chaos Monkey direct. I haven't used it for a while; https://github.com/Netflix/SimianArmy/wiki/Quick-Start-Guide covers starting it....I think it's got more complex than in the early days, when it was more of a CLI thing