- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Cloudera Manager bootstrap failing on 'Installing screen package'
- Labels:
-
Cloudera Manager
Created on ‎01-28-2016 02:08 PM - edited ‎09-16-2022 03:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attempting to bootstrap cloudera manager and a cluster; all nodes are failing when the bootstrap attempts to install the 'screen' package. Have attempted with the following two aws AMI's:
ami-414b7271 (RHEL 6.6, default option for the c34 template)
ami-11125e21 (RHEL 6.5)
[2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline 33280be1-0db3-403e-b15f-e59c9b10a1ca suspended due to failure com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'screen' 2>&1 > /dev/null && echo "Package screen is already installed and upgrades are not forced. Skipping." || sudo yum install -d 1 --assumeyes 'screen' at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0] at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_65] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_65] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027.runUnchecked(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) ~[launchpad-pipeline-database-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.0.0.jar!/:2.0.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] [2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca' failed at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027 at com.cloudera.launchpad.bootstrap.InstallPackages.InstallOrUpgradePackage:1 [2016-01-28 16:13:56] INFO [pipeline-thread-1] - c.c.l.p.s.PipelineRepositoryService: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca': RUNNING -> SUSPENDED [2016-01-28 16:13:56] INFO [pipeline-thread-1] - c.c.l.d.DeploymentRepositoryService: Deployment 'manager': BOOTSTRAPPING -> BOOTSTRAP_FAILED
Created ‎01-28-2016 03:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.
Created ‎01-28-2016 03:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.
Created ‎01-29-2016 11:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created on ‎01-29-2016 11:53 AM - edited ‎01-29-2016 11:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm glad my suggestion for how to diagnose was helpful. Thanks for commenting on what turned out to be the specific problem, in case your solution is helpful to another user in the future.
Created ‎01-29-2016 12:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The specific error we received:
could not contact CDS load balancer rhui2-cds01.us-west-2.aws.ce.redhat.com
I was using the following cloudformation template for configuring our cluster:
http://docs.aws.amazon.com/quickstart/latest/cloudera/step2b.html
It provisions a NAT instance which traffic to the private subnet is proxied through; the private subnet is where the cloudera instances are deployed.
What's not clear is why the original NAT instance wasn't providing outbound access to the internet.
The NAT instance was removed and replaced with a NAT gateway which is a newer AWS product:
https://aws.amazon.com/blogs/aws/new-managed-nat-network-address-translation-gateway-for-aws/
Replacing the original NAT instance entries on the private subnet's routing table with the identifier of the NAT gateway resolved the issue.
Created on ‎11-08-2016 02:46 PM - edited ‎11-08-2016 02:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I experienced this issue when setting the parameter
associatePublicIpAddresses: false
The default seems to be 'true'.
The notes for this parameter say ...
# Whether to associate a public IP address with instances or not. If this is false
# we expect instances to be able to access the internet using a NAT instance
#
# Currently the only way to get optimal S3 data transfer performance is to assign
# public IP addresses to your instances and not use NAT (public subnet type of setup)
#
# See: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html
So it makes sense that if you attempt to set this parameter to false that you may need to configure NAT as mentioned above.
