About dturner

dturner · ‎11-14-2017

I stopped / started cloudera-director-server in an attempt to "force" refresh of it's cluster data. Tailing the application log, I can see that it successfully communicates with my cluster, but reports the state as 'not ready' because of the failed attempt to add a worker node (mentioned earlier in the thread). We plan on using Director to manage customer clusters and I'm concerned at how easily these systems get out of sync - especially when considering that I didn't perform any actions that are documented to be a cause for sync issues. My cluster is up and healthy. Director should be able to determine this. How to recover? In the log below, 172.20.108.55 is the Cloudera Manager IP of my CDH cluster. [2017-11-14 16:34:14.206 +0000] INFO [main] - - - - - c.c.l.p.autorepair.AutoRepairService: Adding auto-repair policy runner for ClusterKey{environmentName='wilbur', deploymentName='wilbur Deployment', clusterName='wilbur'} [2017-11-14 16:34:14.216 +0000] INFO [main] - - - - - c.c.l.p.autorepair.PolicyHandler: Cluster is not ready. Skipping policies evaluation. [2017-11-14 16:34:14.217 +0000] INFO [main] - - - - - com.cloudera.launchpad.Server: Started Server in 27.715 seconds (JVM running for 28.311) [2017-11-14 16:34:14.271 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.13.0/RPMS/x86_64/cloudera-manager-server-5.13.0-1.cm5130.p0.55.el7.x86_64.rpm [2017-11-14 16:34:14.435 +0000] INFO [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Backing up Cloudera Manager configuration for Deployment wilbur:wilbur Deployment [2017-11-14 16:34:14.466 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Starting pipeline 'd135b711-6712-4800-8f07-70d908f2b512' with root job com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener [2017-11-14 16:34:14.571 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Create new runner thread for pipeline 'd135b711-6712-4800-8f07-70d908f2b512' [2017-11-14 16:34:14.784 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig/3 [PluggableComputeInstance{ipAddress=172.20.108.55, delegate=null, hostEndpoints=[HostEndpoint{hostAd ... [2017-11-14 16:34:15.097 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404748, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:15.206 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo tar -cpzf /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz $(sudo ls -d /etc/cloudera ... [2017-11-14 16:34:15.206 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:15.249 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-agent/agent-cert: No such file or directory [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: ls: cannot access /var/lib/cloudera-scm-server/certmanager: No such file or directory [2017-11-14 16:34:15.771 +0000] INFO [io-thread-1] - - - - - ssh:172.20.108.55: tar: Removing leading `/' from member names [2017-11-14 16:34:15.772 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:15.850 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo chown skynet /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172 ... [2017-11-14 16:34:15.850 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:15.881 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:16.436 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:16.560 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: >> DownloadFileAsByteArrayJob/3 [/tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-24-109-63.va.r4cl ... [2017-11-14 16:34:16.561 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:16.602 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: permissions = 600 [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: mtime = 1510677255 [2017-11-14 16:34:17.087 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.launchpad.sshj.SshJClient$3: atime = 1510677255 [2017-11-14 16:34:17.127 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.DownloadFileAsByteArrayJob - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404749, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:17.292 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: >> SshJobUncheckedWithOutputLogging/3 [sudo rm /tmp/cmbackup-56347771-b48b-4399-bcef-459a5bcc3e2e.tar.gz, [172.20.108.55, ip-172-20-108-55 [2017-11-14 16:34:17.293 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2017-11-14 16:34:17.357 +0000] WARN [reader] - - - - - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 172.20.108.55 was automatically accepted [2017-11-14 16:34:18.128 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.pipeline.ssh.SshJobUncheckedWithOutputLogging - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=404750, pipeline=d135b711-6712-4800-8f07-70d908f2b51 ... [2017-11-14 16:34:18.249 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: >> BackupClouderaManagerConfig$SaveConfigBlobToDatabase/3 [[B@332d0fac, wilbur, wilbur Deployment] [2017-11-14 16:34:18.284 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - com.cloudera.launchpad.bootstrap.deployment.BackupClouderaManagerConfig$SaveConfigBlobToDatabase - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-11-14 16:34:18.343 +0000] INFO [p-70d908f2b512-BackupClouderaManagerConfig] - - - - - c.c.l.p.s.PipelineRepositoryService: Pipeline 'd135b711-6712-4800-8f07-70d908f2b512': RUNNING -> COMPLETED [2017-11-14 16:34:18.691 +0000] INFO [task-thread-4] - - - - - c.c.l.p.DatabasePipelineService: Deleting pipeline 'd135b711-6712-4800-8f07-70d908f2b512' [2017-11-14 16:34:18.811 +0000] INFO [task-thread-4] - - - - - c.c.l.task.RefreshDeployments: Finished refreshing all pre-existing Deployment models [2017-11-14 16:35:08.385 +0000] INFO [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing all deployments for usage reporting [2017-11-14 16:35:08.398 +0000] INFO [task-thread-9] - - - - - c.c.l.m.r.DeploymentsReporter: Enqueueing 0 deployments for usage reporting [2017-11-14 16:37:33.613 +0000] INFO [qtp1914740115-63] bca6db0c-a58f-4846-96eb-38eef096bb76 POST /api/v10/login - - c.c.l.a.c.AuthenticationResource: Logging in admin via API

dturner · ‎11-14-2017

Thanks. Just to clarify, the parcels in question were not downloaded / activated in the cluster.

dturner · ‎11-14-2017

I removed the sqoop repo and that did the trick! The parcel errors for accumulo and spark cleared out as well. Upon checking for new parcels I no longer see the java exceptions in the cloudera manager logs.

dturner · ‎11-14-2017

Thanks for the reply. What was problematic about this URL? I see it in our list of parcel URLs, with an "Available Remotely" status. I do see the following parcel errors: Error for parcel ACCUMULO-1.4.4-1.cdh4.5.0.p0.65-el7 : Parcel not available for OS Distribution RHEL7. Error for parcel SPARK-0.9.0-1.cdh4.6.0.p0.98-el7 : Parcel not available for OS Distribution RHEL7. I've removed these from the parcel config, but they still show up in the parcels list. Did you need to restart your agents?

dturner · ‎11-14-2017

I can see queries from my Director server in the Cloudera Manager logs - so it would appear they are still talking to one another. Director reports the cluster as healthy, but doesn't reflect the new version. After the failed attempt to add a worker node using Director UI (mentioned in my other post ) Director console no longer reports the status of my cluster services, nor do I have any available cluster actions other than clone / terminate. I don't even have the option to clean up the failed worker node from Director UI. Meanwhile, the cluster is healthy as reported by Cloudera Manager UI. Director: Cloudera Manager: According to this page, Director should refresh it's information after an upgrade of a CDH cluster. I waited at least an hour after the upgrade before checking Cloudera Director and trying to "grow" my worker nodes.

dturner · ‎11-14-2017

Hi Application logs on Cloudera Director server show no errors/warnings other than SSH key auto-accept warnings.

dturner · ‎11-13-2017

Background: I have a 10 node cluster running CDH version 5.13.0 in AWS which was deployed using Cloudera Director 2.6 client / remote-bootstrap. The cluster was originally bootstrapped with v5.12.1 but has since been upgrdaded to 5.13.0 using parcels / Cloudera Manager. Problem: I tried testing the cluster resize option by adding an instance to our worker-scale role. A new instance spawns in AWS, but fails to successfully add to the cluster with the error: Unable to find all requested parcels on the cluster. Please see my other post related to Clouder Director not reflecting the correct version of my CDH cluster. Could this mismatch have something to do with the failure to add a node to the cluster?

dturner · ‎11-13-2017

Hi. I have a 10 node CDH cluster that I've deployed using Clouder Director (2.6) client remote-bootstrap. I've since upgraded Cloudera Manager and CDH from 5.12.1 to 5.13.0. Cloudera Director does not reflect the new cluster version. Instead, it shows the parcel versions the cluster was originally bootstrapped with. How can I update Cloudera Director to reflect the correct version of my cluster?

dturner · ‎11-12-2017

More details. This cluster was deployed using Cloudera Director client (bootstrap-remote) On the Cloudera manager node, I see the following errors in cloudera-scm-server.log 2017-11-13 00:49:57,323 INFO 742270861@scm-web-119:com.cloudera.server.web.cmf.ParcelController: Synchronizing repos based on user request admin 2017-11-13 00:49:58,080 WARN ParcelUpdateService:com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable: Error while executing CmfEntityManager task java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191) at com.google.common.collect.Collections2.filter(Collections2.java:92) at com.cloudera.parcel.components.ParcelDownloaderImpl$RepositoryInfo.getParcelsWithValidNames(ParcelDownloaderImpl.java:673) at com.cloudera.parcel.components.ParcelDownloaderImpl$RepositoryInfo.getSortedParcels(ParcelDownloaderImpl.java:691) at com.cloudera.parcel.components.ParcelDownloaderImpl.syncRemoteRepos(ParcelDownloaderImpl.java:368) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:438) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:433) at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2017- I've seen similar threads attribute parcel problems to proxy settings, however I don't have problems adding parcels for other products; i.e. Kudu, Kafka.

dturner · ‎11-12-2017

Hi there. I'm attempting to upgrade our CDH cluster from v5.12.1 to v5.13.0. I've succesffully updated CM and all agents to v5.13.0 in preparation for the upgrade. I've added an additional CDH parcel URL, and checked for new parcels but have thus far had no luck getting the Cloudera console to show new parcels available for CDH 5.13.0. I've tried the following parcel URL's so far with no luck: https://archive.cloudera.com/cdh5/parcels/{latest_supported}/ https://archive.cloudera.com/cdh5/parcels/5.13/ https://archive.cloudera.com/cdh5/parcels/5.13.0/ Am I missing something here?

Online	Offline
Last Visited	‎02-16-2022 03:04 PM

Member Since	‎10-02-2017 11:10 AM
Last Visited	‎02-16-2022 03:04 PM
Posts	116
Kudos received	3

Cloudera Community

Re: Need to update AWS key used for Altus Director...

Re: Python lib permissions cause Hue firstrun star...

Re: Altus Director 6.2: Failure to connect to Azur...

Re: CDH 6.3 parcels fail to activate during Altus ...

Re: Permission errors loading Python modules: CDH6...

Re: Cloudera Director not reflecting upgraded CDH ...

Re: CDH Upgrade 5.12.1 to 5.13.0 - Can't add CDH p...

Re: CDH Upgrade 5.12.1 to 5.13.0 - Can't add CDH p...

Re: CDH Upgrade 5.12.1 to 5.13.0 - Can't add CDH p...

Re: Cloudera Director not reflecting upgraded CDH ...

Re: Cloudera Director not reflecting upgraded CDH ...

Attempt to add worker scaler node using Cloudera D...

Cloudera Director not reflecting upgraded CDH clus...

Re: CDH Upgrade 5.12.1 to 5.13.0 - Can't add CDH p...

CDH Upgrade 5.12.1 to 5.13.0 - Can't add CDH parce...