Reply
Highlighted
Contributor
Posts: 26
Registered: ‎10-02-2017
Accepted Solution

Cloudera Director not reflecting upgraded CDH cluster version

Hi.

 

I have a 10 node CDH cluster that I've deployed using Clouder Director (2.6) client remote-bootstrap.  I've since upgraded Cloudera Manager and CDH from 5.12.1 to 5.13.0.    Cloudera Director does not reflect the new cluster version.  Instead, it shows the parcel versions the cluster was originally bootstrapped with.

 

How can I update Cloudera Director to reflect the correct version of my cluster?

Cloudera Employee
Posts: 52
Registered: ‎10-28-2014

Re: Cloudera Director not reflecting upgraded CDH cluster version

Director's deployment and cluster refreshers should detect this sort of thing and update the deployment and cluster representations (respectively) in the database with the updated version information. Can you please check the server application.log to see if either of them encountered problems?

 

Contributor
Posts: 26
Registered: ‎10-02-2017

Re: Cloudera Director not reflecting upgraded CDH cluster version

Hi

 

Application logs on Cloudera Director server show no errors/warnings other than SSH key auto-accept warnings.

Contributor
Posts: 26
Registered: ‎10-02-2017

Re: Cloudera Director not reflecting upgraded CDH cluster version

[ Edited ]

I can see queries from my Director server in the Cloudera Manager logs - so it would appear they are still talking to one another.  Director reports the cluster as healthy, but doesn't reflect the new version.  After the failed attempt to add a worker node using Director UI (mentioned in my other post ) Director console no longer reports the status of my cluster services, nor do I have any available cluster actions other than clone / terminate.   I don't even have the option to clean up the failed worker node from Director UI.   Meanwhile, the cluster is healthy as reported by Cloudera Manager UI.

 

Director:

 

image.png

 

Cloudera Manager:

 

image.png

 

According to this page, Director should refresh it's information after an upgrade of a CDH cluster.  I waited at least an hour after the upgrade before checking Cloudera Director and trying to "grow" my worker nodes. 

Contributor
Posts: 26
Registered: ‎10-02-2017

Re: Cloudera Director not reflecting upgraded CDH cluster version

I stopped / started cloudera-director-server in an attempt to "force" refresh of it's cluster data.  Tailing the application log, I can see that it successfully communicates with my cluster, but reports the state as 'not ready' because of the failed attempt to add a worker node (mentioned earlier in the thread).  

 

We plan on using Director to manage customer clusters and I'm concerned at how easily these systems get out of sync - especially when considering that I didn't perform any actions that are documented to be a cause for sync issues.  My cluster is up and healthy.  Director should be able to determine this.  How to recover?

 

In the log below, 172.20.108.55 is the Cloudera Manager IP of my CDH cluster.

 

[2017-11-14 16:34:14.206 +0000] INFO  [main] - - - - - c.c.l.p.autorepair.AutoRepairService: Adding auto-repair policy runner for ClusterKey{environmentName='wilbur', deploymentName='wilbur Deployment', clusterName='wilbur'}
[2017-11-14 16:34:14.216 +0000] INFO  [main] - - - - - c.c.l.p.autorepair.PolicyHandler: Cluster is not ready. Skipping policies evaluation.
[2017-11-14 16:34:14.217 +0000] INFO  [main] - - - - - com.cloudera.launchpad.Server: Started Server in 27.715 seconds (JVM running for 28.311)
[2017-11-14 16:34:14.271 +0000] INFO  [io-thread-1] - - - - - ssh:172.20.108.55: https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.13.0/RPMS/x86_64/cloudera-manager-server-5.13.0-1.cm5130.p0.55.el7.x86_64.rpm
[2017-11-14 16:34:14.435 +0000] INFO  [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Backing up Cloudera Manager configuration for Deployment wilbur:wilbur Deployment
[2017-11-14 16:34:14.466 +0000] INFO  [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Starting pipeline 'd135b711-6712-4800-8f07-70d908f2b512' with root job com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener
[2017-11-14 16:34:14.571 +0000] INFO  [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Create new runner thread for pipeline 'd135b711-6712-4800-8f07-70d908f2b512'
[2017-11-14 16:34:14.784 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig/3 [PluggableComputeInstance{ipAddress=172.20.108.55, delegate=null, hostEndpoints=[HostEndpoint{hostAd ...
[2017-11-14 16:34:15.097 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404748, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ...
[2017-11-14 16:34:15.206 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo tar -cpzf /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz $(sudo ls -d /etc/cloudera ...
[2017-11-14 16:34:15.206 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection.
[2017-11-14 16:34:15.249 +0000] WARN  [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted
[2017-11-14 16:34:15.771 +0000] INFO  [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-agent/agent-cert: No such file or directory
[2017-11-14 16:34:15.771 +0000] INFO  [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-server/certmanager: No such file or directory
[2017-11-14 16:34:15.771 +0000] INFO  [io-thread-1] - - - - - ssh:172.20.108.55: tar: Removing leading `/' from member names
[2017-11-14 16:34:15.772 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{}
[2017-11-14 16:34:15.850 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo chown skynet /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172 ...
[2017-11-14 16:34:15.850 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection.
[2017-11-14 16:34:15.881 +0000] WARN  [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted
[2017-11-14 16:34:16.436 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{}
[2017-11-14 16:34:16.560 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: >> DownloadFileAsByteArrayJob/3 [/tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-24-109-63.va.r4cl ...
[2017-11-14 16:34:16.561 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection.
[2017-11-14 16:34:16.602 +0000] WARN  [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted
[2017-11-14 16:34:17.087 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: permissions = 600
[2017-11-14 16:34:17.087 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: mtime = 1510677255
[2017-11-14 16:34:17.087 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: atime = 1510677255
[2017-11-14 16:34:17.127 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404749, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ...
[2017-11-14 16:34:17.292 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobUncheckedWithOutputLogging/3 [sudo rm /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-20-108-55
[2017-11-14 16:34:17.293 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection.
[2017-11-14 16:34:17.357 +0000] WARN  [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted
[2017-11-14 16:34:18.128 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404750, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ...
[2017-11-14 16:34:18.249 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig$SaveConfigBlobToDatabase/3 [[B@332d0fac, wilbur, wilbur Deployment]
[2017-11-14 16:34:18.284 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: << None{}
[2017-11-14 16:34:18.343 +0000] INFO  [p-70d908f2b512-BackupClouderaManagerConfig] - - - - - c.c.l.p.s.PipelineRepositoryService: Pipeline 'd135b711-6712-4800-8f07-70d908f2b512': RUNNING -> COMPLETED
[2017-11-14 16:34:18.691 +0000] INFO  [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Deleting pipeline 'd135b711-6712-4800-8f07-70d908f2b512'
[2017-11-14 16:34:18.811 +0000] INFO  [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Finished refreshing all pre-existing Deployment models
[2017-11-14 16:35:08.385 +0000] INFO  [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing all deployments for usage reporting
[2017-11-14 16:35:08.398 +0000] INFO  [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing 0 deployments for usage reporting
[2017-11-14 16:37:33.613 +0000] INFO  [qtp1914740115-63] bca6db0c-a58f-4846-96eb-38eef096bb76 POST /api/v10/login - - c.c.l.a.c.AuthenticationResource: Logging in admin via API
Contributor
Posts: 26
Registered: ‎10-02-2017

Re: Cloudera Director not reflecting upgraded CDH cluster version

Reconciling the cluster using the Crash console got things back in sync for me.  

 

See the following post:  https://community.cloudera.com/t5/Cloudera-Director-Cloud-based/Director-pipeline-SUSPENDED-UPDATE-F...

 

 

Announcements