Created on 01-28-2016 02:08 PM - edited 09-16-2022 03:00 AM
Attempting to bootstrap cloudera manager and a cluster; all nodes are failing when the bootstrap attempts to install the 'screen' package. Have attempted with the following two aws AMI's:
ami-414b7271 (RHEL 6.6, default option for the c34 template)
ami-11125e21 (RHEL 6.5)
[2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline 33280be1-0db3-403e-b15f-e59c9b10a1ca suspended due to failure com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'screen' 2>&1 > /dev/null && echo "Package screen is already installed and upgrades are not forced. Skipping." || sudo yum install -d 1 --assumeyes 'screen' at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0] at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_65] at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_65] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE] at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027.runUnchecked(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na] at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na] at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) ~[launchpad-pipeline-database-2.0.0.jar!/:2.0.0] at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.0.0.jar!/:2.0.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] [2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca' failed at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027 at com.cloudera.launchpad.bootstrap.InstallPackages.InstallOrUpgradePackage:1 [2016-01-28 16:13:56] INFO [pipeline-thread-1] - c.c.l.p.s.PipelineRepositoryService: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca': RUNNING -> SUSPENDED [2016-01-28 16:13:56] INFO [pipeline-thread-1] - c.c.l.d.DeploymentRepositoryService: Deployment 'manager': BOOTSTRAPPING -> BOOTSTRAP_FAILED
Created 01-28-2016 03:24 PM
If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.
Created 01-28-2016 03:24 PM
If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.
Created 01-29-2016 11:50 AM
Created on 01-29-2016 11:53 AM - edited 01-29-2016 11:54 AM
I'm glad my suggestion for how to diagnose was helpful. Thanks for commenting on what turned out to be the specific problem, in case your solution is helpful to another user in the future.
Created 01-29-2016 12:36 PM
The specific error we received:
could not contact CDS load balancer rhui2-cds01.us-west-2.aws.ce.redhat.com
I was using the following cloudformation template for configuring our cluster:
http://docs.aws.amazon.com/quickstart/latest/cloudera/step2b.html
It provisions a NAT instance which traffic to the private subnet is proxied through; the private subnet is where the cloudera instances are deployed.
What's not clear is why the original NAT instance wasn't providing outbound access to the internet.
The NAT instance was removed and replaced with a NAT gateway which is a newer AWS product:
https://aws.amazon.com/blogs/aws/new-managed-nat-network-address-translation-gateway-for-aws/
Replacing the original NAT instance entries on the private subnet's routing table with the identifier of the NAT gateway resolved the issue.
Created on 11-08-2016 02:46 PM - edited 11-08-2016 02:46 PM
I experienced this issue when setting the parameter
associatePublicIpAddresses: false
The default seems to be 'true'.
The notes for this parameter say ...
# Whether to associate a public IP address with instances or not. If this is false
# we expect instances to be able to access the internet using a NAT instance
#
# Currently the only way to get optimal S3 data transfer performance is to assign
# public IP addresses to your instances and not use NAT (public subnet type of setup)
#
# See: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html
So it makes sense that if you attempt to set this parameter to false that you may need to configure NAT as mentioned above.