Created on 07-19-2019 12:27 PM - edited 09-16-2022 08:52 AM
Just wanted to bring up a concern about Director that it doesn't increase the non-java memory allocated to the host monitor (or at least not enough) when deploying a large amount of hosts to a deployment which will cause the host monitor to fail to start (due to CM specifying an error condition in 5.16.2). This doesn't fail the bootstrap (thankfully) but does require the user to go into the CM and manually set the recommendations for the Host Monitor before being able to start it and get metrics about the hosts.
Created 07-19-2019 01:53 PM
For every choice of semantics for these properties, we found failure scenarios that would result in good values overwriting bad values, either in CM during bootstrap or grow, or in Director's deployment or cluster template during refresh. The blacklist was the least-bad solution.
The blacklisting will not result in errors if you provide the properties--they will just be silently ignored when Director passes the configuration to CM. With the blacklists in place, CM becomes the source of truth for these properties, so you will still be able to set them in CM if the autoconfigured values are not acceptable. If for some reason you have problems bootstrapping without them, the bootstrap and update blacklists are separately configurable, so you might be able to provide them at bootstrap. I can't recall what the potential failure scenarios were there.
Created on 07-19-2019 12:37 PM - edited 07-19-2019 12:38 PM
For reference, what I'm doing to address this is setting the following in the api call to create the deployment in DeploymentTemplate(configs=...):
"SERVICEMONITOR" : { "firehose_non_java_memory_bytes" : "6442450944" }, "HOSTMONITOR" : { "firehose_non_java_memory_bytes" : "6442450944" }
Which makes sure the host monitor and smon have 6 GB configured for the non java memory.
Created 07-19-2019 12:54 PM
Can you please clarify whether you were already using those custom settings and they are not being applied correctly, or whether you started providing those custom settings as a solution for the problem you raised in this post?
If it's the latter, I'm glad you found the custom settings. That was going to be my recommendation.
What version of Director are you on? While implementing improvements to the deployment/cluster refresh process in 6.2, we ran into some bugs around those settings getting overwritten during refresh. There is a new blacklist in place to avoid overwriting those properties (so that CMs autoconfigured values don't get smashed), but that could prevent your custom values from taking effect. So if you run into any problems on 6.2 or later, please let us know.
Created 07-19-2019 01:26 PM
I started providing those custom settings as a solution for what I was seeing using Director 6.1.
That is an interesting point you bring up, so when using Director 6.2+ those properties specified would be blacklisted and not applied? Would it reject the api creation request or fail the bootstrap or just silently let CM autoconfigure itself ignoring those parameters?
When 6.3 comes out I will upgrade to it for the connection pool adjustments and verify what the current behavior is with respect to overriding these sort of properties.
Created 07-19-2019 01:53 PM
For every choice of semantics for these properties, we found failure scenarios that would result in good values overwriting bad values, either in CM during bootstrap or grow, or in Director's deployment or cluster template during refresh. The blacklist was the least-bad solution.
The blacklisting will not result in errors if you provide the properties--they will just be silently ignored when Director passes the configuration to CM. With the blacklists in place, CM becomes the source of truth for these properties, so you will still be able to set them in CM if the autoconfigured values are not acceptable. If for some reason you have problems bootstrapping without them, the bootstrap and update blacklists are separately configurable, so you might be able to provide them at bootstrap. I can't recall what the potential failure scenarios were there.
Created 07-22-2019 07:39 AM
Alright awesome. Thanks for providing clarity regarding what the expected behavior will be!