I ran a API call to decommsion datanode.
I used the following to make sure the role is indeed decomissioned:
if role.commissionState == 'DECOMMISSIONED':
then I will run role deletion:
But I got the error:
Removing datanode roles..Failed to remove datanode role on the host Role hdfs-DATANODE-4a8948d61dc8a4727f810f736d9d3447 has 1 active commands (error 400)
It turned out that after commissionState became DECOMMISSONED, the UI still shows the decommisiong command was running for another 10 to 15 seconds.
To workaround that, I had to let the prgram sleep for addtional 60 seconds after the decomission status became decomissioned.
Is this a known issue in API v11?
What you did to work around this is fine. Basically, what you really want to do is wait till the decommission command is complete since Cloudera Manager will not let you delete the role until commands running against that role are complete.
Since "decommission" returns a command object, I imagine you could use the id to query the commands list for that service. I imagine we have some example code for that lying around, but I'm not sure where it is.
Waiting 60 seconds is likely OK, but verifying the command has completed is more sound.
Thanks the wait() worked. But I was wondering what action has wait method. Now i now service start, decommssion etc has wait method but role start stop etc doesn't have wait method.
also, is there a timeout function to the wait()