Ran into an interesting scenario in Director that I have not solved yet but will (hopefully) today.
Created a cluster via the python sdk and when attempting to repair a node in any instance group I get "A template must be specified". There is nothing in the logs and this instance group only contains gateway roles for HDFS and YARN.
This is Director v6.1 and there are no "templates" in the templates tab since everything was programmatically generated.
Nothing in the logs at all related to this and seems to flag the instance group that is solely comprised of gateways when trying to repair any group.
What's interesting is you can click view template and it shows all the expected information above the error. Is there a specific logging I can turn on for this in director that wouldn't flood the logs?
I have a strong suspicion that Director accepted a ClusterTemplate with a VirtualInstanceGroup comprised of VirtualInstances that had InstanceTemplates with names larger than 40 characters.. 41 to be exact for this one group that is failing.
I discovered the following when making a smaller PoC a moment ago to try to only create the InstanceTemplates so that they would appear in Director and then manually add to see if issue was same:
(The name must have a length of 2-40 characters, the first and last of which must be alphanumeric. The rest may include space, underscore, and hyphen.)"
Will be trimming the names to under 40 characters and deploying a new cluster to see if the issue still persists.
However, I do believe this is a bug that Director can get into this state!
And then change the name in the VirtualInstanceGroup to masters.1 (add an invalid character to it like a .) it will create the cluster successfully but then the user in the UI will not be able to repair nodes / grow / shrink nor clone the cluster.
It will also just gray out the continue button and provide no feedback to the user in the logs nor the UI.