Created on 01-05-2016 07:28 PM - edited 09-16-2022 02:55 AM
While running the 'cloudera-director bootstrap' command for the cloudera director client we see the following error in the logs while running the 'Installing cloudera-manager-daemons package' step. This exact configuration was used to start a cluster the previous day so this is probably something with the environment but I can't find any indications of what may be the cause.
cloudera-director bootstrap cluster.conf Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log Plugins will be loaded from /var/lib/cloudera-director-client/plugins Cloudera Director 1.5.2 initializing ... Installing Cloudera Manager ... * Starting ...... done * Requesting an instance for Cloudera Manager .............................. done * Inspecting capabilities of 10.172.4.89 .......... done * Installing screen package (1/1) .......... done * Running custom bootstrap script on 10.172.4.89 ......... done * Waiting for SSH access to 10.172.4.89 on port 22 ....... done * Inspecting capabilities of 10.172.4.89 ............... done * Normalizing 10.172.4.89 ....... done * Installing ntp package (1/4) ...... done * Installing curl package (2/4) ....... done * Installing nscd package (3/4) ....... done * Installing gdisk package (4/4) ......................... done * Resizing instance root partition ............ done * Rebooting 10.172.4.89 ..... done * Waiting for 10.172.4.89 to boot ....... done * Mounting all instance disk drives ............ done * Waiting for new external database servers to start running ........... done * Installing repositories for Cloudera Manager ......... done * Installing cloudera-manager-daemons package (1/2) .......
[2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.4/repodata/repomd.xml: [Errno 14] PYCURL ERROR 56 - "Failure when receiving data from the peer" [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Trying other mirror. [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Error: Cannot retrieve repository metadata (repomd.xml) for repository: cloudera-manager. Please verify its path and try again [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=500, pipeline=022613a2-a9e2-49e1-86cf-b4d35445a285, ... [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null, 10.172.4.89, SshCredentials ... [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2016-01-05 18:46:27] WARN [reader] - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 10.172.4.89 was automatically accepted [2016-01-05 18:46:28] INFO [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list [2016-01-05 18:46:28] ERROR [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:55) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0] at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:51) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.6.0_30] at java.lang.reflect.Method.invoke(Method.java:622) ~[na:1.6.0_30] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$cde3fbd8.runUnchecked(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:165) [launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:136) [launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:98) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:120) ~[launchpad-pipeline-database-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) ~[launchpad-common-1.5.2.jar!/:1.5.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.6.0_30] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.6.0_30] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.6.0_30] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) ~[na:1.6.0_30] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.6.0_30] at java.lang.Thread.run(Thread.java:701) ~[na:1.6.0_30]
I am able to manually SSH into the server and try to run the cloudera-manager-daemons installation and I get this output.
[ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C list installed 'cloudera-manager-daemons' Loaded plugins: amazon-id, lsbvars, rhui-lb, security Error: No matching Packages to list [ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C install 'cloudera-manager-daemons' Loaded plugins: amazon-id, lsbvars, rhui-lb, security Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package cloudera-manager-daemons.x86_64 0:5.4.9-1.cm549.p0.9.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================================================================================================================================================================== Package Arch Version Repository Size ============================================================================================================================================================================================================================================================================== Installing: cloudera-manager-daemons x86_64 5.4.9-1.cm549.p0.9.el6 cloudera-manager 638 M Transaction Summary ============================================================================================================================================================================================================================================================================== Install 1 Package(s) Total download size: 638 M Installed size: 902 M Is this ok [y/N]: y Downloading Packages: Error Downloading Packages: cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6Server/cloudera-manager/packages/cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64.rpm from cloudera-manager
Since I know of no changes in our files, I am not sure why the yum / rpm installations from cloudera no longer appear to be working.
Created 01-19-2016 02:28 PM
After experiencing this for a week it appears to have resolved itself. The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.
Created 01-06-2016 07:15 AM
Hi Dustin -
Is this transient? It could a tempoary Cloudera CDN / download site error. Did the AWS environment change in a way that could impact outbound internet access? (new security group rules, network ACLs, NAT instance changes etc.)
It could help to add "yum clean all" to the bootstrap script to make sure there is no stale information.
Created 01-06-2016 10:57 AM
Andrei,
This was happenning all day yesterday with at least 4 attempts made. I tried again this morning and while it passed the cloudera-manager-daemon step, it failed with basically the same error on the 'Installing cloudera-manager-server' step this time. I am can check with our operations to see if there are any network changes in AWS but I don't think so. I also checked the security group and all traffic is enabled on both inbound and outbound rules.
I will add yum cleanup commands to the end of our custom bootstrap to see if that works.
yum clean all
yum makecache
Thanks
Created 01-06-2016 11:25 AM
This may also happen due to one of the recent maintainance releases.
Please check the following properties for accuracy:
Created 01-06-2016 05:36 PM
Here was our initial configuration: (#1)
We also tried to update this to a specific version with these configurations: (#2)
Startup Attempts
I don't think the yum commands made a difference but it would be nice to know what we should specify in the repositories and products configurations to specify a specific version of CDH to install.
Created 01-07-2016 10:57 AM
This is an example on how you can have strict control over version numbers for all the diferrent components:
cloudera-manager { repository: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5.1/" repositoryKeyUrl: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera"
[...] } cluster { products { CDH: 5.5.1 } parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.5.1/"]
[...] }
It's important that when you override the parcel version you also change Cloudera Manager repository to be in sync. Older versions of Cloudera Manager do not support newer versions of CDH and in some cases you may get hard to debug non-determistic failures.
Do you have a NAT instance in your environment? Could that be affected by high load?
Created 01-08-2016 12:21 PM
Based on your response we were able to get a specific version of the cluster created. We are still experiencing about a 50% failure rate when installing however. If we can confirm there are no issues on cloudera's end then we will have to start looking into issues with the AWS accounts NAT.
Created 01-19-2016 02:28 PM
After experiencing this for a week it appears to have resolved itself. The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.