Member since
02-18-2014
94
Posts
23
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3458 | 08-29-2019 07:56 AM | |
4202 | 07-09-2019 08:22 AM | |
2143 | 07-01-2019 02:21 PM | |
3663 | 03-19-2019 07:42 AM | |
2816 | 09-04-2018 05:29 AM |
08-29-2019
07:56 AM
Hi Da, You're correct, you can edit the Altus Director server's logback.xml file to have it not emit log messages for that Java class. It would look like this. <logger name="com.cloudera.api.ext.ClouderaManagerClientProxy" level="INFO" /> More details on editing logback.xml are here: https://www.cloudera.com/documentation/director/latest/topics/director_troubleshoot.html#director-intro__d72814e37 While those debug lines are noisy, they can be helpful when problems arise. So, in the future, if you are troubleshooting problems with Altus Director and Cloudera Manager, flipping that logger back on might reveal some useful information.
... View more
07-10-2019
06:10 AM
Thanks for the update!
... View more
07-09-2019
08:22 AM
1 Kudo
I might need to see the whole file, but in case that's not feasible for you: Try working backwards from the deployment and cluster templates. For example, you might have this in the "cloudera-manager" / deployment template section (I say "might" because you've made modifications, which is fine). cloudera-manager {
instance: ${instances.edge} {
... This means that the instance template used for CM inherits from the "instances.edge" object with some overrides in the block that I've elided here with "...". So, you'd go back to the "instances" section, "edge" subsection. Anything there is available here. Does the "instances.edge" subsection include values for "useCustomManagedImage" and "customImagePlan"? That section in our older sample Azure files looked like this. instances { .... edge {
image: ${?common-instanceTemplate.base.image}
type: ${?common-instanceTemplate.base.type}
computeResourceGroup: ${?common-instanceTemplate.edge.computeResourceGroup}
networkSecurityGroupResourceGroup: ${?common-instanceTemplate.base.networkSecurityGroupResourceGroup}
networkSecurityGroup: ${?common-instanceTemplate.base.networkSecurityGroup}
virtualNetworkResourceGroup: ${?common-instanceTemplate.base.virtualNetworkResourceGroup}
virtualNetwork: ${?common-instanceTemplate.base.virtualNetwork}
subnetName: ${?common-instanceTemplate.base.subnetName}
instanceNamePrefix: ${?common-instanceTemplate.edge.instanceNamePrefix}
hostFqdnSuffix: ${?common-instanceTemplate.base.hostFqdnSuffix}
availabilitySet: ${?common-instanceTemplate.edge.availabilitySet}
publicIP: ${?common-instanceTemplate.edge.publicIP}
storageAccountType: ${?common-instanceTemplate.edge.storageAccountType}
dataDiskCount: ${?common-instanceTemplate.edge.dataDiskCount}
dataDiskSize: ${?common-instanceTemplate.edge.dataDiskSize}
managedDisks: ${?common-instanceTemplate.edge.managedDisks}
tags: ${?common-instanceTemplate.base.tags}
bootstrapScripts: [ ${?bootstrap-script.os-generic} ]
} You can see that this lacks "useCustomManagedImage" and "customImagePlan". You'd need to add those two properties here for the deployment template to get their values. You can either add them literally, or refer to the values in "common-instanceTemplate.base" or "common-instanceTemplate.edge" or anywhere else they are actually defined. Or you can just put them right into the deployment template, in the "instance" subsection under the "cloudera-manager" section, and not worry about how the inheritance should be working. You can hopefully see how the multiple layers of indirection doesn't help with a clear configuration.
... View more
07-09-2019
07:00 AM
Having seen a customer HOCON configuration file with this problem, I have a good guess as to the problem for you. Until recently, our example Azure configuration files used two layers of indirection to specify configuration properties for instance templates. There was a first set in a section called "common-instanceTemplate" where the actual values were defined, and then a second set "instances" where properties were defined based on the values in the first set. The deployment and cluster templates toward the bottom of the file used the properties in the "instances" section. What might be happening for you (and happened for the other customer) is that the "useCustomManagedImage" and "customImagePlan" properties aren't defined in the "instances" section. So, they don't carry through from their initial definition in the "common-instanceTemplate" section to the deployment and cluster templates. If this is the case, then adding lines like these to each of the subsections under "instances" will fix the problem. useCustomManagedImage: ${?common-instanceTemplate.base.useCustomManagedImage}
customImagePlan: ${?common-instanceTemplate.base.customImagePlan} Even though these are just mistakes in the HOCON, really the problem is that our example Azure configuration files were needlessly complex. They've been updated recently to eliminate this redirection, so I also suggest taking a look and seeing if that pattern suits your uses better. https://github.com/cloudera/director-scripts/blob/master/configs/azure.simple.conf Please let me know if this solves your problem.
... View more
07-09-2019
06:30 AM
Hi dturner, This should be supported. I'm actually currently looking into this problem on a support escalation. customImagePlan definitely does not need to be supplied when using a custom image. I think it used to be required in the past, but it no longer is, and that validation error message is just out of date. The error message seems to be triggered because Altus Director has not noticed that the useCustomManagedImage field is set to Yes. When it sees that correctly, then Altus Director skips the portion of validation that triggers the error message you are getting. My only guess at the moment is that the HOCON parsing of the actual instance templates - not the base one, but those that inherit from it - is somehow not picking up the useCustomManagedImage field. So, maybe try repeating the useCustomManagedImage field across image templates to see if it helps. That's not a satisfactory final solution, but might avoid whatever the real problem is. In the meantime, I'm continuing to investigate, so stay tuned 🙂
... View more
07-01-2019
02:29 PM
1 Kudo
Hi GaryS, Thanks for following up that you were able to resolve your issue! For others, to clarify: If you change the username and password for Cloudera Manager (for example, from the default admin/admin), then you do need to update Altus Director with the new credentials. That way, Altus Director can continue to work with Cloudera Manager to do things like add new hosts to a cluster. There is an option in the dropdown for a deployment in Altus Director to update the credentials. In case there is still a problem in communications, a workaround is to set Cloudera Manager (and Altus Director) back to admin/admin, then do what you need to do, and then switch Cloudera Manager back. There is in fact one scenario in Altus Director where this is necessary, which we're working on fixing. To add to what Asif said about the ways to add a new cluster node through Director: Besides the UI, you can use the Altus Director server API as well. The UI is just a special client for the API, anyway. Visit http://yourdirectorhost.example:7189/api-console/ for an interactive (Swagger) console to experiment. You can also try using the Java or Python SDKs, available on GitHub. https://github.com/cloudera/director-sdk
... View more
07-01-2019
02:21 PM
1 Kudo
Hi CK71, This is kind of a wide-open question, so I'll give you a wide-open answer with some ideas for implementing "transient" clusters. To start out with, you may want to think about having a cluster with a core of non-transient nodes that house management information, so-called "master" nodes that host the HDFS namenode, YARN resource manager, and so on. You also would want to keep around enough stateful nodes, like HDFS datanodes and Kudu tablet servers, so that fundamental data stays available (e.g., to stay above your chosen HDFS replication factor which defaults to 3). Then, you have the ability to scale out with stateless "compute" nodes, like YARN node managers and Spark workers, when the cluster workload increases, and then tear them down when the load is lighter. Next, a good goal is to store important data on cloud storage services, like S3 for AWS and ADLS for Azure. Hadoop and other services have the ability to reference data in those services directly - for example, Hive and HDFS can be backed by S3 - or you can establish ways to copy data into a cluster from the services to work on it, and then copy final result data back out to the services. (You'd want to avoid saving intermediate data, like temporary HDFS files or Hive tables that only matter in the middle of a workflow, because that data can be regenerated.) Once you can persist data to cloud storage services, then you have a basis for making new clusters from nothing, and then pulling in data for them to work on. Saving off metadata, such as Hive table mappings (metastore) and Sentry / Ranger authorization rules, to cloud storage is also a good idea. You can use cloud database services, like RDS on AWS, for that, or else general block storage services like S3 or ADLS. Metadata usually needs to apply to all of your clusters, transient or not, because they define common business rules. The idea behind SDX is to make saving common metadata an easy and usual thing, so that it's easier to hook up new clusters to it. Automating the creation of clusters is really important, especially for transient clusters that you'd bring up and tear down all the time. That's the purpose for tools like Altus Director and Cloudbreak. We also have customers who use general tools like Ansible, Chef, Puppet, or the like, since they are more familiar with them, or have standardized on them. If you have automated cluster creation, and important data and metadata persisted in cloud storage services, then you've got the ingredients for successfully working with transient clusters in the cloud. I know this isn't a precise answer for how to do transient clusters, but hopefully I've given you some avenues to explore.
... View more
03-19-2019
07:42 AM
1 Kudo
You appear to be using the username "scm" as an administrative user on the MySQL instance. The "scm" user must have permission to create and delete databases on the server. Normally, one would use the default MySQL "root" user, which already should have all of the necessary permissions. - The string starting with "scm_" and ending with a random string is the generated name of the Cloudera Manager server database. - The string "uxnlmrno" is the username for the user that shall be used to access the new database. Apparently, in your configuration file, you do not specify a usernamePrefix for the database. It is optional. Director is running a script on the CM instance to perform the database work. (The script uses CM code, so it needs to run where CM is installed). It is trying to reach the database server at the full hostname ending in ".internal", and it seems that the connectivity there is working. You can double-check by running a MySQL client from the CM instance itself. You say you can create and drop databases from the MySQL console. Does that use the "scm" user or, perhaps, the "root" user?
... View more
02-19-2019
08:32 AM
Hi Rana, The errors for "Opening `direct-tcpip` channel failed: Connection refused" happen under normal circumstances and can be ignored. I thought we'd suppressed them by now in Director logging, but maybe not. 🙂 Still, I see there's a failure to connect to Cloudera Manager at the end of the log snippet. That means that Director was trying to check on the status of a running command and couldn't reach Cloudera Manager. The IP address for that instance of Cloudera Manager is present in prior logging lines, so apparently the instance is at least reachable. Director did have to establish an SSH tunnel to talk to it: Successfully established tunnel to server 10.142.0.59 at 7180 There's not enough context in the logs here for me to go much further. The pipeline thread for the Cloudera Manager failure, "p-66c81a04d10f-BootstrapClouderaManagerAgent", doesn't appear elsewhere in the sample, but other threads are working with the instance successfully. Also, the beginning of line 68 is cut off, so I'm concerned that multiple Director instances are running at the same time. General ideas to troubleshoot: - Run Director inside GCP so that it can make direct connections to all of the new instances. Once that's working, try running Director outside of it, which is where I think it's running right now. - At first, just do one thing at a time in Director. It makes troubleshooting the logs easier. (We're actually working on a way to split logging out by cluster to help here.) - If that instance is still up, see if Cloudera Manager is in fact running. Maybe it died for some reason? Logs in /var/log/cloudera-scm-server are usually informative. Bill
... View more
11-08-2018
05:14 AM
1 Kudo
Hello yarivgraf, Good news: A community member implemented subnetwork support in the Google plugin, and the work was merged several weeks ago. https://github.com/cloudera/director-google-plugin/pull/150 The next 6.x release of Cloudera Altus Director will come packaged with a new plugin release that includes this change. In the meantime, you can build the plugin and install it into your existing Altus Director 2.8 or 6.0 installations and it should work. * For Director 2.8, build and install the plugin from the v1.x branch. * For Director 6.0, build and install the plugin from the v2.0.x branch. The plugin's README describes the build process and links to some docs on installing the plugin in Altus Director. For the latter, you basically place the plugin JAR into Altus Director's plugins directory, replacing the prior Google plugin JAR.
... View more