About Da

Da · ‎09-26-2019

After upgrading to 6.3 and setting the following in the application.properties: spring.datasource.minimumIdle=100 spring.datasource.maximumPoolSize=200 I can see in the DEBUG logs of HikariConfig that those settings are being respected and the cloudwatch charts show the same. Thanks again!

Da · ‎09-26-2019

Thanks for this info (and link) and duly noted on troubleshooting in the future!

Da · ‎07-29-2019

Hey Ben, Thanks for getting back to me and sorry about the delay in responding. I ended up testing this Friday and everything went smoothly with the CSD. I did run into some issues with a custom parcel and CM making the wrong *.sha file upon downloading but putting the parcel in /opt/cloudera/parcel-repo helped clear that up. I may end up making that a different forum post as it seems a bit buggy. Regards, Dan

Da · ‎07-24-2019

Not sure if this is the right subsection of this community forums but I am curious if there are fundamental changes that would prevent a custom CSD that was written for (and works for) a 5.x CM from working on a CM 6.x build. I will more than likely end up testing this myself soon but I guess I am wondering if there any caveats that I should be aware of or watch out for. Cheers!

Da · ‎07-24-2019

@Mike Wilson On that note I managed to find another issue with the validator. If you create a VirtualInstanceGroup similar to the python example here: https://github.com/cloudera/director-sdk/blob/master/python-client-samples/cluster.py#L159 And then change the name in the VirtualInstanceGroup to masters.1 (add an invalid character to it like a .) it will create the cluster successfully but then the user in the UI will not be able to repair nodes / grow / shrink nor clone the cluster. It will also just gray out the continue button and provide no feedback to the user in the logs nor the UI. Cheers!

Da · ‎07-24-2019

@Mike Wilson is there anyway to disable this logging or call? The logs really spam this as every health check for every deployment runs this: grep com.cloudera.api.ext.ClouderaManagerClientProxy /var/log/cloudera-director-server/application.log* | wc -l 13063 Probably need to configure logback.xml but I'm not trying to silence a real error.

Da · ‎07-23-2019

This is because the limit file (/proc/<director_pid>/limit) of the process has a "Max open files" of 1024 which is to low for most operations. A solution for this since it uses systemd on RHEL/CentOS 7 is to do the following: # make a folder for custom systemd changes for this service mkdir -p /etc/systemd/system/cloudera-director-server.service.d/ # make an override conf file so that a Director upgrade will not break the changes vim /etc/systemd/system/cloudera-director-server.service.d/override.conf # then add the following in that file and save/quit it [Service] LimitNOFILE=65536 # next reload the daemon systemctl daemon-reload # finally restart Director systemctl restart cloudera-director-server Then if you check the limit file in the new process you will see it show 65536 as the "Max open files". Hopefully this can help someone in the future. Cheers!

Da · ‎07-23-2019

Often when provisioning clusters nodes will be cancelled due to Cloudera Director not being able to open more file handles: [2019-07-23 17:31:18.585 +0000] ERROR [p-ebcce1c842e9-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - com.cloudera.launchpad.pipeline.util.PipelineRunner: Attempt to execute job failed java.net.SocketException: Too many open files at java.net.Socket.createImpl(Socket.java:460) at java.net.Socket.connect(Socket.java:587) at net.schmizz.sshj.SocketClient.connect(SocketClient.java:126) at com.cloudera.launchpad.sshj.SshJClient.attemptConnection(SshJClient.java:343) at com.cloudera.launchpad.sshj.SshJClient.attemptConnection(SshJClient.java:318) at com.cloudera.launchpad.sshj.SshJClient.access$000(SshJClient.java:68) How to increase the file handles as ulimit and limits.conf do not seem to work?

Da · ‎07-22-2019

Yep that works fine! So I am explicitly creating the templates first so Director can validate that it likes them before creating the DeploymentTemplate or the ClusterTemplates. It was an issue related to the names of the InstanceTemplates were > 40 characters and Director accepted the ClusterTemplate with those InstanceTemplates. Cheers!

Da · ‎07-22-2019

I have a strong suspicion that Director accepted a ClusterTemplate with a VirtualInstanceGroup comprised of VirtualInstances that had InstanceTemplates with names larger than 40 characters.. 41 to be exact for this one group that is failing. I discovered the following when making a smaller PoC a moment ago to try to only create the InstanceTemplates so that they would appear in Director and then manually add to see if issue was same: (The name must have a length of 2-40 characters, the first and last of which must be alphanumeric. The rest may include space, underscore, and hyphen.)" Will be trimming the names to under 40 characters and deploying a new cluster to see if the issue still persists. However, I do believe this is a bug that Director can get into this state!

Online	Offline
Last Visited	‎03-06-2020 09:17 PM

Member Since	‎05-14-2019 02:24 PM
Last Visited	‎03-06-2020 09:17 PM
Posts	26

Cloudera Community

Re: Too many open files

Re: "A template must be specified"

Re: The request was rejected because the URL conta...

Re: How to increase database pool for Director?

Re: API call to Cloudera Manager failed for Cloude...

Re: Custom CSDs from CM 5.x to 6.x

Custom CSDs from CM 5.x to 6.x

Re: "A template must be specified"

Re: API call to Cloudera Manager failed for Cloude...

Re: Too many open files

Too many open files

Re: "A template must be specified"

Re: "A template must be specified"