Support Questions

Find answers, ask questions, and share your expertise

Cloudera Director bootstrap failing on 'Installing cloudera-manager-daemons package'

avatar
Explorer

While running the 'cloudera-director bootstrap' command for the cloudera director client we see the following error in the logs while running the 'Installing cloudera-manager-daemons package' step.  This exact configuration was used to start a cluster the previous day so this is probably something with the environment but I can't find any indications of what may be the cause.

 

cloudera-director bootstrap cluster.conf 
Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-client/plugins
Cloudera Director 1.5.2 initializing ...
Installing Cloudera Manager ...
* Starting ...... done
* Requesting an instance for Cloudera Manager .............................. done
* Inspecting capabilities of 10.172.4.89 .......... done
* Installing screen package (1/1) .......... done
* Running custom bootstrap script on 10.172.4.89 ......... done
* Waiting for SSH access to 10.172.4.89 on port 22 ....... done
* Inspecting capabilities of 10.172.4.89 ............... done
* Normalizing 10.172.4.89 ....... done
* Installing ntp package (1/4) ...... done
* Installing curl package (2/4) ....... done
* Installing nscd package (3/4) ....... done
* Installing gdisk package (4/4) ......................... done
* Resizing instance root partition ............ done
* Rebooting 10.172.4.89 ..... done
* Waiting for 10.172.4.89 to boot ....... done
* Mounting all instance disk drives ............ done
* Waiting for new external database servers to start running ........... done
* Installing repositories for Cloudera Manager ......... done
* Installing cloudera-manager-daemons package (1/2) .......

 

[2016-01-05 18:46:26] INFO  [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list
[2016-01-05 18:46:26] INFO  [io-thread-6] - ssh:10.172.4.89: http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.4/repodata/repomd.xml: [Errno 14] PYCURL ERROR 56 - "Failure when receiving data from the peer"
[2016-01-05 18:46:26] INFO  [io-thread-6] - ssh:10.172.4.89: Trying other mirror.
[2016-01-05 18:46:26] INFO  [io-thread-6] - ssh:10.172.4.89: Error: Cannot retrieve repository metadata (repomd.xml) for repository: cloudera-manager. Please verify its path and try again
[2016-01-05 18:46:27] INFO  [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=500, pipeline=022613a2-a9e2-49e1-86cf-b4d35445a285,  ...
[2016-01-05 18:46:27] INFO  [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: >> SshJobFailFastWithOutputLogging/3 [sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null, 10.172.4.89, SshCredentials ...
[2016-01-05 18:46:27] INFO  [pipeline-thread-1] - c.cloudera.launchpad.sshj.SshJClient: Attempting SSH connection.
[2016-01-05 18:46:27] WARN  [reader] - c.c.l.sshj.TrustAnyHostKeyVerifier: Host key for 10.172.4.89 was automatically accepted
[2016-01-05 18:46:28] INFO  [io-thread-6] - ssh:10.172.4.89: Error: No matching Packages to list
[2016-01-05 18:46:28] ERROR [pipeline-thread-1] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.common.ssh.SshException: Script execution failed with code 1. Script: sudo yum -C list installed 'cloudera-manager-daemons' 2>&1 > /dev/null
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:45) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging.run(SshJobFailFastWithOutputLogging.java:27) ~[launchpad-pipeline-common-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.job.Job3.runUnchecked(Job3.java:32) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.job.Job3$$FastClassBySpringCGLIB$$54178503.invoke(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:717) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:55) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.0.jar!/:3.1.0]
	at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:51) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.6.0_30]
	at java.lang.reflect.Method.invoke(Method.java:622) ~[na:1.6.0_30]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:653) ~[spring-aop-4.1.5.RELEASE.jar!/:4.1.5.RELEASE]
	at com.cloudera.launchpad.pipeline.ssh.SshJobFailFastWithOutputLogging$$EnhancerBySpringCGLIB$$cde3fbd8.runUnchecked(<generated>) ~[spring-core-4.1.5.RELEASE.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:165) [launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:136) [launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) ~[guava-retrying-1.0.6.jar!/:na]
	at com.github.rholder.retry.Retryer.call(Retryer.java:110) ~[guava-retrying-1.0.6.jar!/:na]
	at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:98) ~[launchpad-pipeline-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:120) ~[launchpad-pipeline-database-1.5.2.jar!/:1.5.2]
	at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) ~[launchpad-common-1.5.2.jar!/:1.5.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.6.0_30]
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.6.0_30]
	at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.6.0_30]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) ~[na:1.6.0_30]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.6.0_30]
	at java.lang.Thread.run(Thread.java:701) ~[na:1.6.0_30]

I am able to manually SSH into the server and try to run the cloudera-manager-daemons installation and I get this output.

 

[ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C list installed 'cloudera-manager-daemons'
Loaded plugins: amazon-id, lsbvars, rhui-lb, security
Error: No matching Packages to list
[ec2-user@i-23b5d6a2.poc-dis11.daws1 ~]$ sudo yum -C install 'cloudera-manager-daemons'
Loaded plugins: amazon-id, lsbvars, rhui-lb, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package cloudera-manager-daemons.x86_64 0:5.4.9-1.cm549.p0.9.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================================================================================================================================================================
 Package                                                                 Arch                                                  Version                                                                  Repository                                                       Size
==============================================================================================================================================================================================================================================================================
Installing:
 cloudera-manager-daemons                                                x86_64                                                5.4.9-1.cm549.p0.9.el6                                                   cloudera-manager                                                638 M

Transaction Summary
==============================================================================================================================================================================================================================================================================
Install       1 Package(s)

Total download size: 638 M
Installed size: 902 M
Is this ok [y/N]: y
Downloading Packages:


Error Downloading Packages:
  cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6Server/cloudera-manager/packages/cloudera-manager-daemons-5.4.9-1.cm549.p0.9.el6.x86_64.rpm from cloudera-manager

Since I know of no changes in our files, I am not sure why the yum / rpm installations from cloudera no longer appear to be working.  

1 ACCEPTED SOLUTION

avatar
Explorer

After experiencing this for a week it appears to have resolved itself.  The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.

View solution in original post

7 REPLIES 7

avatar
Master Collaborator

Hi Dustin - 

 

Is this transient? It could a tempoary Cloudera CDN / download site error. Did the AWS environment change in a way that could impact outbound internet access? (new security group rules, network ACLs, NAT instance changes etc.)

 

It could help to add "yum clean all" to the bootstrap script to make sure there is no stale information.

 

avatar
Explorer

Andrei,

This was happenning all day yesterday with at least 4 attempts made.  I tried again this morning and while it passed the cloudera-manager-daemon step, it failed with basically the same error on the 'Installing cloudera-manager-server' step this time.  I am can check with our operations to see if there are any network changes in AWS but I don't think so.  I also checked the security group and all traffic is enabled on both inbound and outbound rules.

 

I will add yum cleanup commands to the end of our custom bootstrap to see if that works.

yum clean all

yum makecache

 

Thanks

avatar
Master Collaborator

This may also happen due to one of the recent maintainance releases.

Please check the following properties for accuracy:

 

  • within the cloudera-manager {} block repository: and repositoryKeyUrl: 
  • within the cluster {} block products {} and parcelRepositories:

avatar
Explorer

Here was our initial configuration: (#1)

  • no `repository, repositoryKeyUrl, or parcelRepositories` configurations specified in the cloudera-manager block
  • inside cluster{} -> products{} we had 'CDH: 5' configured

 

We also tried to update this to a specific version with these configurations: (#2)

  • only 'parcelRepositories: ' specified as http://archive.cloudera.com/cdh5/parcels/5.3.3/
  • no `repository or repositoryKeyUrl` configurations specified
  • inside cluster{} -> products{} we have 'CDH: 5.3.3' configured

Startup Attempts

  1. The first attempt today with configuration #1 was unsuccessful.
  2. Another attempt with conf #1 and adding `yum clean all` and `yum makecache` was successful
  3. We updated to conf #2, keeping the yum commands and we ran into the same error.

I don't think the yum commands made a difference but it would be nice to know what we should specify in the repositories and products configurations to specify a specific version of CDH to install.

avatar
Master Collaborator

This is an example on how you can have strict control over version numbers for all the diferrent components:

 

 

cloudera-manager {

    repository: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5.1/"
    repositoryKeyUrl: "http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera"

[...] } cluster { products { CDH: 5.5.1 } parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.5.1/"]
[...] }

It's important that when you override the parcel version you also change Cloudera Manager repository to be in sync. Older versions of Cloudera Manager do not support newer versions of CDH and in some cases you may get hard to debug non-determistic failures. 

 

 

Do you have a NAT instance in your environment? Could that be affected by high load? 

avatar
Explorer

Based on your response we were able to get a specific version of the cluster created.  We are still experiencing about a 50% failure rate when installing however.  If we can confirm there are no issues on cloudera's end then we will have to start looking into issues with the AWS accounts NAT.

avatar
Explorer

After experiencing this for a week it appears to have resolved itself.  The most likely root cause for this was networking issues within AWS however we were not able to find anything in our specific instance.