Created on 11-13-2017 02:30 PM - edited 09-16-2022 05:30 AM
Hi.
I have a 10 node CDH cluster that I've deployed using Clouder Director (2.6) client remote-bootstrap. I've since upgraded Cloudera Manager and CDH from 5.12.1 to 5.13.0. Cloudera Director does not reflect the new cluster version. Instead, it shows the parcel versions the cluster was originally bootstrapped with.
How can I update Cloudera Director to reflect the correct version of my cluster?
Created 11-20-2017 11:39 AM
Reconciling the cluster using the Crash console got things back in sync for me.
See the following post: https://community.cloudera.com/t5/Cloudera-Director-Cloud-based/Director-pipeline-SUSPENDED-UPDATE-F...
Created 11-13-2017 04:13 PM
Director's deployment and cluster refreshers should detect this sort of thing and update the deployment and cluster representations (respectively) in the database with the updated version information. Can you please check the server application.log to see if either of them encountered problems?
Created 11-14-2017 07:42 AM
Hi
Application logs on Cloudera Director server show no errors/warnings other than SSH key auto-accept warnings.
Created on 11-14-2017 08:14 AM - edited 11-14-2017 10:19 AM
I can see queries from my Director server in the Cloudera Manager logs - so it would appear they are still talking to one another. Director reports the cluster as healthy, but doesn't reflect the new version. After the failed attempt to add a worker node using Director UI (mentioned in my other post ) Director console no longer reports the status of my cluster services, nor do I have any available cluster actions other than clone / terminate. I don't even have the option to clean up the failed worker node from Director UI. Meanwhile, the cluster is healthy as reported by Cloudera Manager UI.
Director:
Cloudera Manager:
According to this page, Director should refresh it's information after an upgrade of a CDH cluster. I waited at least an hour after the upgrade before checking Cloudera Director and trying to "grow" my worker nodes.
Created 11-14-2017 11:41 AM
I stopped / started cloudera-director-server in an attempt to "force" refresh of it's cluster data. Tailing the application log, I can see that it successfully communicates with my cluster, but reports the state as 'not ready' because of the failed attempt to add a worker node (mentioned earlier in the thread).
We plan on using Director to manage customer clusters and I'm concerned at how easily these systems get out of sync - especially when considering that I didn't perform any actions that are documented to be a cause for sync issues. My cluster is up and healthy. Director should be able to determine this. How to recover?
In the log below, 172.20.108.55 is the Cloudera Manager IP of my CDH cluster.
[2017-11-14 16:34:14.206 +0000] INFO [main] - - - - - c.c.l.p.autorepair.AutoRepairService: Adding auto-repair policy runner for ClusterKey{environmentName='wilbur', deploymentName='wilbur Deployment', clusterName='wilbur'} [2017-11-14 16:34:14.216 +0000] INFO [main] - - - - - c.c.l.p.autorepair.PolicyHandler: Cluster is not ready. Skipping policies evaluation. [2017-11-14 16:34:14.217 +0000] INFO [main] - - - - - com.cloudera.launchpad.Server: Started Server in 27.715 seconds (JVM running for 28.311) [2017-11-14 16:34:14.271 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.13.0/RPMS/x86_64/cloudera-manager-server-5.13.0-1.cm5130.p0.55.el7.x86_64.rpm [2017-11-14 16:34:14.435 +0000] INFO [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Backing up Cloudera Manager configuration for Deployment wilbur:wilbur Deployment [2017-11-14 16:34:14.466 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Starting pipeline 'd135b711-6712-4800-8f07-70d908f2b512' with root job com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener [2017-11-14 16:34:14.571 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Create new runner thread for pipeline 'd135b711-6712-4800-8f07-70d908f2b512' [2017-11-14 16:34:14.784 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig/3 [PluggableComputeInstance{ipAddress=172.20.108.55, delegate=null, hostEndpoints=[HostEndpoint{hostAd ... [2017-11-14 16:34:15.097 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404748, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:15.206 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo tar -cpzf /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz $(sudo ls -d /etc/cloudera ... [2017-11-14 16:34:15.206 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:15.249 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-agent/agent-cert: No such file or directory [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-server/certmanager: No such file or directory [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: tar: Removing leading `/' from member names [2017-11-14 16:34:15.772 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:15.850 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo chown skynet /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172 ... [2017-11-14 16:34:15.850 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:15.881 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:16.436 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:16.560 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: >> DownloadFileAsByteArrayJob/3 [/tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-24-109-63.va.r4cl ... [2017-11-14 16:34:16.561 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:16.602 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: permissions = 600 [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: mtime = 1510677255 [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: atime = 1510677255 [2017-11-14 16:34:17.127 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404749, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:17.292 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobUncheckedWithOutputLogging/3 [sudo rm /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-20-108-55 [2017-11-14 16:34:17.293 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:17.357 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:18.128 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404750, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:18.249 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig$SaveConfigBlobToDatabase/3 [[B@332d0fac, wilbur, wilbur Deployment] [2017-11-14 16:34:18.284 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:18.343 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - - - c.c.l.p.s.PipelineRepositoryService: Pipeline 'd135b711-6712-4800-8f07-70d908f2b512': RUNNING -> COMPLETED [2017-11-14 16:34:18.691 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Deleting pipeline 'd135b711-6712-4800-8f07-70d908f2b512' [2017-11-14 16:34:18.811 +0000] INFO [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Finished refreshing all pre-existing Deployment models [2017-11-14 16:35:08.385 +0000] INFO [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing all deployments for usage reporting [2017-11-14 16:35:08.398 +0000] INFO [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing 0 deployments for usage reporting [2017-11-14 16:37:33.613 +0000] INFO [qtp1914740115-63] bca6db0c-a58f-4846-96eb-38eef096bb76 POST /api/v10/login - - c.c.l.a.c.AuthenticationResource: Logging in admin via API
Created 11-20-2017 11:39 AM
Reconciling the cluster using the Crash console got things back in sync for me.
See the following post: https://community.cloudera.com/t5/Cloudera-Director-Cloud-based/Director-pipeline-SUSPENDED-UPDATE-F...