- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Cloudera Director bootstrap failing on 'Installing cloudera-manager-daemons package'
- Labels:
-
Cloudera Manager
-
Security
Created on ‎01-05-2016 07:28 PM - edited ‎09-16-2022 02:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While running the 'cloudera-director bootstrap' command for the cloudera director client we see the following error in the logs while running the 'Installing cloudera-manager-daemons package' step. This exact configuration was used to start a cluster the previous day so this is probably something with the environment but I can't find any indications of what may be the cause.
cloudera-director bootstrap cluster.conf Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log Plugins will be loaded from /var/lib/cloudera-director-client/plugins Cloudera Director 1.5.2 initializing ... Installing Cloudera Manager ... * Starting ...... done * Requesting an instance for Cloudera Manager .............................. done * Inspecting capabilities of 10.172.4.89 .......... done * Installing screen package (1/1) .......... done * Running custom bootstrap script on 10.172.4.89 ......... done * Waiting for SSH access to 10.172.4.89 on port 22 ....... done * Inspecting capabilities of 10.172.4.89 ............... done * Normalizing 10.172.4.89 ....... done * Installing ntp package (1/4) ...... done * Installing curl package (2/4) ....... done * Installing nscd package (3/4) ....... done * Installing gdisk package (4/4) ......................... done * Resizing instance root partition ............ done * Rebooting 10.172.4.89 ..... done * Waiting for 10.172.4.89 to boot ....... done * Mounting all instance disk drives ............ done * Waiting for new external database servers to start running ........... done * Installing repositories for Cloudera Manager ......... done * Installing cloudera-manager-daemons package (1/2) .......
[2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.4/repodata/repomd.xml: [Errno 14] PYCURL ERROR 56 - "Failure when receiving data from the peer" [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Trying other mirror. [2016-01-05 18:46:26] INFO [io-thread-6] - ssh:10.172.4.89: Error: Cannot retrieve repository metadata (repomd.xml) for repository: cloudera-manager. Please verify its path and try again [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=500, pipeline=022613a2-a9e2-49e1-86cf-b4d35445a285, ... [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null, 10.172.4.89, SshCredentials ... [2016-01-05 18:46:27] INFO [pipeline-thread-1] - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection. [2016-01-05 18:46:27] WARN [reader] - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 10.172.4.89 was automatically accepted [2016-01-05 18:46:28] INFO [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list [2016-01-05 18:46:28] ERROR [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:55) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0] at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:51) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.6.0_30] at java.lang.reflect.Method.invoke(Method.java:622) ~[na:1.6.0_30] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$cde3fbd8.runUnchecked(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:165) [launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:136) [launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:98) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:120) ~[launchpad-pipeline-database-1.5.2.jar!/:1.5.2] at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) ~[launchpad-common-1.5.2.jar!/:1.5.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.6.0_30] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.6.0_30] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.6.0_30] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) ~[na:1.6.0_30] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.6.0_30] at java.lang.Thread.run(Thread.java:701) ~[na:1.6.0_30]
I am able to manually SSH into the server and try to run the cloudera-manager-daemons installation and I get this output.
[ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C list installed 'cloudera-manager-daemons' Loaded plugins: amazon-id, lsbvars, rhui-lb, security Error: No matching Packages to list [ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C install 'cloudera-manager-daemons' Loaded plugins: amazon-id, lsbvars, rhui-lb, security Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package cloudera-manager-daemons.x86_64 0:5.4.9-1.cm549.p0.9.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================================================================================================================================================================== Package Arch Version Repository Size ============================================================================================================================================================================================================================================================================== Installing: cloudera-manager-daemons x86_64 5.4.9-1.cm549.p0.9.el6 cloudera-manager 638 M Transaction Summary ============================================================================================================================================================================================================================================================================== Install 1 Package(s) Total download size: 638 M Installed size: 902 M Is this ok [y/N]: y Downloading Packages: Error Downloading Packages: cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6Server/cloudera-manager/packages/cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64.rpm from cloudera-manager
Since I know of no changes in our files, I am not sure why the yum / rpm installations from cloudera no longer appear to be working.
Created ‎01-19-2016 02:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After experiencing this for a week it appears to have resolved itself. The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.
Created ‎01-06-2016 07:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dustin -
Is this transient? It could a tempoary Cloudera CDN / download site error. Did the AWS environment change in a way that could impact outbound internet access? (new security group rules, network ACLs, NAT instance changes etc.)
It could help to add "yum clean all" to the bootstrap script to make sure there is no stale information.
Created ‎01-06-2016 10:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrei,
This was happenning all day yesterday with at least 4 attempts made. I tried again this morning and while it passed the cloudera-manager-daemon step, it failed with basically the same error on the 'Installing cloudera-manager-server' step this time. I am can check with our operations to see if there are any network changes in AWS but I don't think so. I also checked the security group and all traffic is enabled on both inbound and outbound rules.
I will add yum cleanup commands to the end of our custom bootstrap to see if that works.
yum clean all
yum makecache
Thanks
Created ‎01-06-2016 11:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may also happen due to one of the recent maintainance releases.
Please check the following properties for accuracy:
- within the cloudera-manager {} block repository: and repositoryKeyUrl:
- within the cluster {} block products {} and parcelRepositories:
Created ‎01-06-2016 05:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here was our initial configuration: (#1)
- no `repository, repositoryKeyUrl, or parcelRepositories` configurations specified in the cloudera-manager block
- inside cluster{} -> products{} we had 'CDH: 5' configured
We also tried to update this to a specific version with these configurations: (#2)
- only 'parcelRepositories: ' specified as http://archive.cloudera.com/cdh5/parcels/5.3.3/
- no `repository or repositoryKeyUrl` configurations specified
- inside cluster{} -> products{} we have 'CDH: 5.3.3' configured
Startup Attempts
- The first attempt today with configuration #1 was unsuccessful.
- Another attempt with conf #1 and adding `yum clean all` and `yum makecache` was successful
- We updated to conf #2, keeping the yum commands and we ran into the same error.
I don't think the yum commands made a difference but it would be nice to know what we should specify in the repositories and products configurations to specify a specific version of CDH to install.
Created ‎01-07-2016 10:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an example on how you can have strict control over version numbers for all the diferrent components:
cloudera-manager { repository: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5.1/" repositoryKeyUrl: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera"
[...] } cluster { products { CDH: 5.5.1 } parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.5.1/"]
[...] }
It's important that when you override the parcel version you also change Cloudera Manager repository to be in sync. Older versions of Cloudera Manager do not support newer versions of CDH and in some cases you may get hard to debug non-determistic failures.
Do you have a NAT instance in your environment? Could that be affected by high load?
Created ‎01-08-2016 12:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on your response we were able to get a specific version of the cluster created. We are still experiencing about a 50% failure rate when installing however. If we can confirm there are no issues on cloudera's end then we will have to start looking into issues with the AWS accounts NAT.
Created ‎01-19-2016 02:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After experiencing this for a week it appears to have resolved itself. The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.
