Member since
05-14-2019
26
Posts
0
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6458 | 07-23-2019 10:42 AM | |
3212 | 07-22-2019 03:39 PM | |
20385 | 07-22-2019 02:44 PM |
09-26-2019
03:34 PM
After upgrading to 6.3 and setting the following in the application.properties: spring.datasource.minimumIdle=100
spring.datasource.maximumPoolSize=200 I can see in the DEBUG logs of HikariConfig that those settings are being respected and the cloudwatch charts show the same. Thanks again!
... View more
09-26-2019
02:51 PM
Thanks for this info (and link) and duly noted on troubleshooting in the future!
... View more
07-29-2019
08:29 AM
Hey Ben, Thanks for getting back to me and sorry about the delay in responding. I ended up testing this Friday and everything went smoothly with the CSD. I did run into some issues with a custom parcel and CM making the wrong *.sha file upon downloading but putting the parcel in /opt/cloudera/parcel-repo helped clear that up. I may end up making that a different forum post as it seems a bit buggy. Regards, Dan
... View more
07-24-2019
09:30 AM
Not sure if this is the right subsection of this community forums but I am curious if there are fundamental changes that would prevent a custom CSD that was written for (and works for) a 5.x CM from working on a CM 6.x build. I will more than likely end up testing this myself soon but I guess I am wondering if there any caveats that I should be aware of or watch out for. Cheers!
... View more
Labels:
- Labels:
-
Cloudera Manager
07-24-2019
09:04 AM
@Mike Wilson On that note I managed to find another issue with the validator. If you create a VirtualInstanceGroup similar to the python example here: https://github.com/cloudera/director-sdk/blob/master/python-client-samples/cluster.py#L159 And then change the name in the VirtualInstanceGroup to masters.1 (add an invalid character to it like a .) it will create the cluster successfully but then the user in the UI will not be able to repair nodes / grow / shrink nor clone the cluster. It will also just gray out the continue button and provide no feedback to the user in the logs nor the UI. Cheers!
... View more
07-24-2019
09:00 AM
@Mike Wilson is there anyway to disable this logging or call? The logs really spam this as every health check for every deployment runs this: grep com.cloudera.api.ext.ClouderaManagerClientProxy /var/log/cloudera-director-server/application.log* | wc -l 13063
Probably need to configure logback.xml but I'm not trying to silence a real error.
... View more
07-23-2019
10:42 AM
This is because the limit file (/proc/<director_pid>/limit) of the process has a "Max open files" of 1024 which is to low for most operations. A solution for this since it uses systemd on RHEL/CentOS 7 is to do the following: # make a folder for custom systemd changes for this service mkdir -p /etc/systemd/system/cloudera-director-server.service.d/
# make an override conf file so that a Director upgrade will not break the changes vim /etc/systemd/system/cloudera-director-server.service.d/override.conf
# then add the following in that file and save/quit it
[Service]
LimitNOFILE=65536 # next reload the daemon systemctl daemon-reload # finally restart Director systemctl restart cloudera-director-server Then if you check the limit file in the new process you will see it show 65536 as the "Max open files". Hopefully this can help someone in the future. Cheers!
... View more
07-23-2019
10:42 AM
Often when provisioning clusters nodes will be cancelled due to Cloudera Director not being able to open more file handles:
[2019-07-23 17:31:18.585 +0000] ERROR [p-ebcce1c842e9-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - com.cloudera.launchpad.pipeline.util.PipelineRunner: Attempt to execute job failed
java.net.SocketException: Too many open files
at java.net.Socket.createImpl(Socket.java:460)
at java.net.Socket.connect(Socket.java:587)
at net.schmizz.sshj.SocketClient.connect(SocketClient.java:126)
at com.cloudera.launchpad.sshj.SshJClient.attemptConnection(SshJClient.java:343)
at com.cloudera.launchpad.sshj.SshJClient.attemptConnection(SshJClient.java:318)
at com.cloudera.launchpad.sshj.SshJClient.access$000(SshJClient.java:68)
How to increase the file handles as ulimit and limits.conf do not seem to work?
... View more
Labels:
- Labels:
-
Cloudera Enterprise Data Hub
07-22-2019
03:39 PM
Yep that works fine! So I am explicitly creating the templates first so Director can validate that it likes them before creating the DeploymentTemplate or the ClusterTemplates. It was an issue related to the names of the InstanceTemplates were > 40 characters and Director accepted the ClusterTemplate with those InstanceTemplates. Cheers!
... View more
07-22-2019
02:59 PM
I have a strong suspicion that Director accepted a ClusterTemplate with a VirtualInstanceGroup comprised of VirtualInstances that had InstanceTemplates with names larger than 40 characters.. 41 to be exact for this one group that is failing. I discovered the following when making a smaller PoC a moment ago to try to only create the InstanceTemplates so that they would appear in Director and then manually add to see if issue was same: (The name must have a length of 2-40 characters, the first and last of which must be alphanumeric. The rest may include space, underscore, and hyphen.)" Will be trimming the names to under 40 characters and deploying a new cluster to see if the issue still persists. However, I do believe this is a bug that Director can get into this state!
... View more