Support Questions

Find answers, ask questions, and share your expertise

Cloudera Manager bootstrap failing on 'Installing screen package'

avatar

 

Attempting to bootstrap cloudera manager and a cluster; all nodes are failing when the bootstrap attempts to install the 'screen' package. Have attempted with the following two aws AMI's:

 

ami-414b7271 (RHEL 6.6, default option for the c34 template)
ami-11125e21 (RHEL 6.5)

[2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline 33280be1-0db3-403e-b15f-e59c9b10a1ca suspended due to failure
com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'screen' 2>&1 > /dev/null && echo "Package screen is already
installed and upgrades are not forced.  Skipping." || sudo yum install -d 1 --assumeyes 'screen'
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0]
	at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_65]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_65]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.6.RELEASE.jar!/:4.1.6.RELEASE]
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027.runUnchecked(<generated>) ~[spring-core-4.1.6.RELEASE.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na]
	at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) ~[launchpad-pipeline-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) ~[launchpad-pipeline-database-2.0.0.jar!/:2.0.0]
	at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.0.0.jar!/:2.0.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
[2016-01-28 16:13:56] ERROR [pipeline-thread-1] - c.c.l.p.DatabasePipelineRunner: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca' failed
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$6f647027
	at com.cloudera.launchpad.bootstrap.InstallPackages.InstallOrUpgradePackage:1

[2016-01-28 16:13:56] INFO  [pipeline-thread-1] - c.c.l.p.s.PipelineRepositoryService: Pipeline '33280be1-0db3-403e-b15f-e59c9b10a1ca': RUNNING -> SUSPENDED
[2016-01-28 16:13:56] INFO  [pipeline-thread-1] - c.c.l.d.DeploymentRepositoryService: Deployment 'manager': BOOTSTRAPPING -> BOOTSTRAP_FAILED

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.

 

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

If you ssh into the instance and try to execute those commands manually, it should be apparent what is going wrong. Could be network configuration issues trying to talk to the yum repo, etc.

 

avatar
Our NAT instance was configured incorrectly. Switching to AWS' new NAT Gateway within the VPC wizard resolved this issue.

avatar
Expert Contributor

I'm glad my suggestion for how to diagnose was helpful. Thanks for commenting on what turned out to be the specific problem, in case your solution is helpful to another user in the future.

 

avatar

The specific error we received:

could not contact CDS load balancer rhui2-cds01.us-west-2.aws.ce.redhat.com




I was using the following cloudformation template for configuring our cluster:

http://docs.aws.amazon.com/quickstart/latest/cloudera/step2b.html

It provisions a NAT instance which traffic to the private subnet is proxied through; the private subnet is where the cloudera instances are deployed.

What's not clear is why the original NAT instance wasn't providing outbound access to the internet.


The NAT instance was removed and replaced with a NAT gateway which is a newer AWS product:

https://aws.amazon.com/blogs/aws/new-managed-nat-network-address-translation-gateway-for-aws/

Replacing the original NAT instance entries on the private subnet's routing table with the identifier of the NAT gateway resolved the issue.

 

 

avatar
Cloudera Employee

I experienced this issue when setting the parameter

 

associatePublicIpAddresses: false

 

The default seems to be 'true'.

The notes for this parameter say ...

 

# Whether to associate a public IP address with instances or not. If this is false
# we expect instances to be able to access the internet using a NAT instance
#
# Currently the only way to get optimal S3 data transfer performance is to assign
# public IP addresses to your instances and not use NAT (public subnet type of setup)
#
# See: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ip-addressing.html

 

So it makes sense that if you attempt to set this parameter to false that you may need to configure NAT as mentioned above.