Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

cluster bootstrap failed, how to best fix the problem?

cluster bootstrap failed, how to best fix the problem?

Explorer

Using Cloudera Director 2.1, I started a CM (latest version), quit out of the "add cluster" so I could go back an edit templates first. Then I attempted to add a cdh5.3.3 cluster. After a while, this error appeared:

CLOUDERAMANAGEREXCEPTION{MESSAGE="API CALL TO CLOUDERA MANAGER FAILED. METHOD=ROLESRESOURCE.CREATEROLES",CAUSECLASS=CLASS JAVAX.WS.RS.BADREQUESTEXCEPTION, CAUSEMESSAGE="HTTP 400 BAD REQUEST"}

This seems to correspond to:

[2016-07-27 00:16:47] INFO  [pipeline-thread-3] - c.c.l.bootstrap.cluster.AddServices: Creating and configuring services [HDFS, HIVE, HUE, OOZIE, SQOOP, YARN, ZOOKEEPER]
[2016-07-27 00:16:48] INFO  [pipeline-thread-3] - c.c.launchpad.pipeline.AbstractJob: Creating cluster services
[2016-07-27 00:16:49] INFO  [pipeline-thread-3] - c.c.launchpad.pipeline.AbstractJob: Assigning roles to instances
[2016-07-27 00:16:49] INFO  [pipeline-thread-3] - c.c.l.bootstrap.cluster.AddServices: Creating 10 roles for service CD-HDFS-aPCZFODJ
[2016-07-27 00:16:49] ERROR [pipeline-thread-3] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: ClouderaManagerException{message="API call to Cloudera Manager failed. Method=RolesResource.createRoles",causeClass=class javax.ws.rs.BadRequestException, causeMessage="HTTP 400 Bad Request"}
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:321) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:100) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5.runUnchecked(Job5.java:34) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5$$FastClassBySpringCGLIB$$54178505.invoke(<generated>) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.2.jar!/:3.1.2]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at sun.reflect.GeneratedMethodAccessor206.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_72]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_72]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices$$EnhancerBySpringCGLIB$$904ca61c.runUnchecked(<generated>) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na]
        at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) [launchpad-pipeline-database-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.1.0.jar!/:2.1.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=RolesResource.createRoles
        at com.cloudera.api.ext.ClouderaManagerClientProxy.invoke(ClouderaManagerClientProxy.java:97) ~[launchpad-cloudera-manager-api-ext-2.1.0.jar!/:2.1.0]
        at com.sun.proxy.$Proxy239.createRoles(Unknown Source) ~[na:na]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.manuallyAssignRoles(AddServices.java:404) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:287) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        ... 33 common frames omitted

My template is on us-east-1 using ami-1643ff7e on c3.2xlarge with ssh user ec2-user ; this is what I was using in Cloudera Director 1.5. I am not using IAM Profile or Placement group. I put it on a subnet (the machines were running). 

 

How should I go about fixing this? 

 

7 REPLIES 7

Re: cluster bootstrap failed, how to best fix the problem?

Explorer

Some more detail: CM is 5.7.1 and I can sign in as admin. No clusters on it yet. (I terminated the failed cdh5.3.3 attempt.)

 

Re: cluster bootstrap failed, how to best fix the problem?

Explorer

Update on this issue: I re-tried a cluster start with two differences:

 

  • it had only one master instead of 2
  • the archive repo was /5.3 instead of /5.3.3

It also was done at a different time of day (1-2hr later). This time, it came up fine! Until I try 2 masters again, I won't know whether that is the problem or if there is a timeout sensitivity (sometimes I have seen this with Whirr).

 

Re: cluster bootstrap failed, how to best fix the problem?

Cloudera Employee

So looks like the previously aborted adding cluster operation leads to the later add cluster error, right?

 

UnrecoverablePipelineError is a severe error in director and it normally blocks later operations until it is fixed. I think the reason for the error is that when the first cluster creation is aborted, the cluster and the hosts are not being created or not in ready state, the addRole/addService operation also failed since it requires the cluster/hosts are in ready state.

 

One quick fix you can try is to terminate the cluster deployment and retry again with the templates you defined. Or if you still run into issues, you can try to terminate the CM deployment and add a new CM deployment.

 

If you still have issues, you could contact Cloudera support and request a webex trouble shooting session. We can use internal tool to help you to debug further.


 

 

Re: cluster bootstrap failed, how to best fix the problem?

Explorer

More specificially, nothing was aborted. All I did was (1) add a CM which started successfully and (2) click on another page instead of adding a cluster after that finished. 

 

Re: cluster bootstrap failed, how to best fix the problem?

Cloudera Employee

Let me try to reproduce your issue here. Could you give me your detailed CM and cluster configuration? Also your instance templates for master, workers and gateway nodes. You can send me some screen shots if that is easier for you.

 

Thanks,

John

Re: cluster bootstrap failed, how to best fix the problem?

Explorer

More info: I created a cdh5.7.2 cluster ("latest") using the same config but with 2 master nodes and cluster create failed with:

CLOUDERAMANAGEREXCEPTION{MESSAGE="API CALL TO CLOUDERA MANAGER FAILED. METHOD=ROLESRESOURCE.CREATEROLES",CAUSECLASS=CLASS JAVAX.WS.RS.BADREQUESTEXCEPTION, CAUSEMESSAGE="HTTP 400 BAD REQUEST"}

it failed at step 572/581 if that helps.

 

This is looking like 2 masters is the problem.

 

The corresponding application.log point is:

[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.l.bootstrap.cluster.AddServices: Creating and configuring services [HDFS, HIVE, HUE, OOZIE, SQOOP, YARN, ZOOKEEPER]
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.launchpad.pipeline.AbstractJob: Creating cluster services
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.launchpad.pipeline.AbstractJob: Assigning roles to instances
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.l.bootstrap.cluster.AddServices: Creating 10 roles for service CD-HDFS-ROVGroaX
[2016-07-28 01:32:14] ERROR [pipeline-thread-27] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: ClouderaManagerException{message="API call to Cloudera Manager failed. Method=RolesResource.createRoles",causeClass=class javax.ws.rs.BadRequestException, causeMessage="HTTP 400 Bad Request"}
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:321) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:100) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5.runUnchecked(Job5.java:34) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5$$FastClassBySpringCGLIB$$54178505.invoke(<generated>) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.2.jar!/:3.1.2]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at sun.reflect.GeneratedMethodAccessor206.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_72]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_72]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices$$EnhancerBySpringCGLIB$$904ca61c.runUnchecked(<generated>) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na]
        at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) [launchpad-pipeline-database-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.1.0.jar!/:2.1.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=RolesResource.createRoles
        at com.cloudera.api.ext.ClouderaManagerClientProxy.invoke(ClouderaManagerClientProxy.java:97) ~[launchpad-cloudera-manager-api-ext-2.1.0.jar!/:2.1.0]
        at com.sun.proxy.$Proxy239.createRoles(Unknown Source) ~[na:na]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.manuallyAssignRoles(AddServices.java:404) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:287) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        ... 33 common frames omitted

 

Re: cluster bootstrap failed, how to best fix the problem?

Cloudera Employee

Actually it is a known issue. Director 2.1 does not allow certain master roles to be on multiple hosts of the same instance group. So if you create two master nodes in one group, by default, all the master roles are being assigned to the two master nodes.

 

If you want to have a workaround, just create two master  groups and make each group contain one master node. 

 

If the reason of setting up two master nodes is for high availability, the following is the Director 2.1 HA doc link:

http://www.cloudera.com/documentation/director/latest/topics/director_create_ha_clusters.html#concep...

 

You can use https://github.com/cloudera/director-scripts/blob/master/configs/aws.ha.reference.conf as a reference to set up a HA cluster.