Reply
Explorer
Posts: 22
Registered: ‎07-18-2016

cluster bootstrap failed, how to best fix the problem?

Using Cloudera Director 2.1, I started a CM (latest version), quit out of the "add cluster" so I could go back an edit templates first. Then I attempted to add a cdh5.3.3 cluster. After a while, this error appeared:

CLOUDERAMANAGEREXCEPTION{MESSAGE="API CALL TO CLOUDERA MANAGER FAILED. METHOD=ROLESRESOURCE.CREATEROLES",CAUSECLASS=CLASS JAVAX.WS.RS.BADREQUESTEXCEPTION, CAUSEMESSAGE="HTTP 400 BAD REQUEST"}

This seems to correspond to:

[2016-07-27 00:16:47] INFO  [pipeline-thread-3] - c.c.l.bootstrap.cluster.AddServices: Creating and configuring services [HDFS, HIVE, HUE, OOZIE, SQOOP, YARN, ZOOKEEPER]
[2016-07-27 00:16:48] INFO  [pipeline-thread-3] - c.c.launchpad.pipeline.AbstractJob: Creating cluster services
[2016-07-27 00:16:49] INFO  [pipeline-thread-3] - c.c.launchpad.pipeline.AbstractJob: Assigning roles to instances
[2016-07-27 00:16:49] INFO  [pipeline-thread-3] - c.c.l.bootstrap.cluster.AddServices: Creating 10 roles for service CD-HDFS-aPCZFODJ
[2016-07-27 00:16:49] ERROR [pipeline-thread-3] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: ClouderaManagerException{message="API call to Cloudera Manager failed. Method=RolesResource.createRoles",causeClass=class javax.ws.rs.BadRequestException, causeMessage="HTTP 400 Bad Request"}
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:321) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:100) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5.runUnchecked(Job5.java:34) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5$$FastClassBySpringCGLIB$$54178505.invoke(<generated>) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.2.jar!/:3.1.2]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at sun.reflect.GeneratedMethodAccessor206.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_72]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_72]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices$$EnhancerBySpringCGLIB$$904ca61c.runUnchecked(<generated>) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na]
        at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) [launchpad-pipeline-database-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.1.0.jar!/:2.1.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=RolesResource.createRoles
        at com.cloudera.api.ext.ClouderaManagerClientProxy.invoke(ClouderaManagerClientProxy.java:97) ~[launchpad-cloudera-manager-api-ext-2.1.0.jar!/:2.1.0]
        at com.sun.proxy.$Proxy239.createRoles(Unknown Source) ~[na:na]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.manuallyAssignRoles(AddServices.java:404) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:287) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        ... 33 common frames omitted

My template is on us-east-1 using ami-1643ff7e on c3.2xlarge with ssh user ec2-user ; this is what I was using in Cloudera Director 1.5. I am not using IAM Profile or Placement group. I put it on a subnet (the machines were running). 

 

How should I go about fixing this? 

 

Explorer
Posts: 22
Registered: ‎07-18-2016

Re: cluster bootstrap failed, how to best fix the problem?

Some more detail: CM is 5.7.1 and I can sign in as admin. No clusters on it yet. (I terminated the failed cdh5.3.3 attempt.)

 

Explorer
Posts: 22
Registered: ‎07-18-2016

Re: cluster bootstrap failed, how to best fix the problem?

Update on this issue: I re-tried a cluster start with two differences:

 

  • it had only one master instead of 2
  • the archive repo was /5.3 instead of /5.3.3

It also was done at a different time of day (1-2hr later). This time, it came up fine! Until I try 2 masters again, I won't know whether that is the problem or if there is a timeout sensitivity (sometimes I have seen this with Whirr).

 

Highlighted
Cloudera Employee
Posts: 4
Registered: ‎07-27-2016

Re: cluster bootstrap failed, how to best fix the problem?

So looks like the previously aborted adding cluster operation leads to the later add cluster error, right?

 

UnrecoverablePipelineError is a severe error in director and it normally blocks later operations until it is fixed. I think the reason for the error is that when the first cluster creation is aborted, the cluster and the hosts are not being created or not in ready state, the addRole/addService operation also failed since it requires the cluster/hosts are in ready state.

 

One quick fix you can try is to terminate the cluster deployment and retry again with the templates you defined. Or if you still run into issues, you can try to terminate the CM deployment and add a new CM deployment.

 

If you still have issues, you could contact Cloudera support and request a webex trouble shooting session. We can use internal tool to help you to debug further.


 

 

Explorer
Posts: 22
Registered: ‎07-18-2016

Re: cluster bootstrap failed, how to best fix the problem?

More specificially, nothing was aborted. All I did was (1) add a CM which started successfully and (2) click on another page instead of adding a cluster after that finished. 

 

Cloudera Employee
Posts: 4
Registered: ‎07-27-2016

Re: cluster bootstrap failed, how to best fix the problem?

Let me try to reproduce your issue here. Could you give me your detailed CM and cluster configuration? Also your instance templates for master, workers and gateway nodes. You can send me some screen shots if that is easier for you.

 

Thanks,

John

Explorer
Posts: 22
Registered: ‎07-18-2016

Re: cluster bootstrap failed, how to best fix the problem?

More info: I created a cdh5.7.2 cluster ("latest") using the same config but with 2 master nodes and cluster create failed with:

CLOUDERAMANAGEREXCEPTION{MESSAGE="API CALL TO CLOUDERA MANAGER FAILED. METHOD=ROLESRESOURCE.CREATEROLES",CAUSECLASS=CLASS JAVAX.WS.RS.BADREQUESTEXCEPTION, CAUSEMESSAGE="HTTP 400 BAD REQUEST"}

it failed at step 572/581 if that helps.

 

This is looking like 2 masters is the problem.

 

The corresponding application.log point is:

[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.l.bootstrap.cluster.AddServices: Creating and configuring services [HDFS, HIVE, HUE, OOZIE, SQOOP, YARN, ZOOKEEPER]
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.launchpad.pipeline.AbstractJob: Creating cluster services
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.launchpad.pipeline.AbstractJob: Assigning roles to instances
[2016-07-28 01:32:14] INFO  [pipeline-thread-27] - c.c.l.bootstrap.cluster.AddServices: Creating 10 roles for service CD-HDFS-ROVGroaX
[2016-07-28 01:32:14] ERROR [pipeline-thread-27] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: ClouderaManagerException{message="API call to Cloudera Manager failed. Method=RolesResource.createRoles",causeClass=class javax.ws.rs.BadRequestException, causeMessage="HTTP 400 Bad Request"}
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:321) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:100) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5.runUnchecked(Job5.java:34) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.job.Job5$$FastClassBySpringCGLIB$$54178505.invoke(<generated>) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:97) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler$1.call(PipelineJobProfiler.java:67) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.codahale.metrics.Timer.time(Timer.java:101) ~[metrics-core-3.1.2.jar!/:3.1.2]
        at com.cloudera.launchpad.pipeline.PipelineJobProfiler.profileJobRun(PipelineJobProfiler.java:63) ~[launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at sun.reflect.GeneratedMethodAccessor206.invoke(Unknown Source) ~[na:na]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_72]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_72]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655) ~[spring-aop-4.2.4.RELEASE.jar!/:4.2.4.RELEASE]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices$$EnhancerBySpringCGLIB$$904ca61c.runUnchecked(<generated>) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:159) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner$JobCallable.call(PipelineRunner.java:130) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78) [guava-retrying-1.0.6.jar!/:na]
        at com.github.rholder.retry.Retryer.call(Retryer.java:110) [guava-retrying-1.0.6.jar!/:na]
        at com.cloudera.launchpad.pipeline.util.PipelineRunner.attemptMultipleJobExecutionsWithRetries(PipelineRunner.java:99) [launchpad-pipeline-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.pipeline.DatabasePipelineRunner.run(DatabasePipelineRunner.java:125) [launchpad-pipeline-database-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.ExceptionHandlingRunnable.run(ExceptionHandlingRunnable.java:57) [launchpad-common-2.1.0.jar!/:2.1.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_72]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
Caused by: com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=RolesResource.createRoles
        at com.cloudera.api.ext.ClouderaManagerClientProxy.invoke(ClouderaManagerClientProxy.java:97) ~[launchpad-cloudera-manager-api-ext-2.1.0.jar!/:2.1.0]
        at com.sun.proxy.$Proxy239.createRoles(Unknown Source) ~[na:na]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.manuallyAssignRoles(AddServices.java:404) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        at com.cloudera.launchpad.bootstrap.cluster.AddServices.run(AddServices.java:287) ~[launchpad-bootstrap-2.1.0.jar!/:2.1.0]
        ... 33 common frames omitted

 

Cloudera Employee
Posts: 4
Registered: ‎07-27-2016

Re: cluster bootstrap failed, how to best fix the problem?

Actually it is a known issue. Director 2.1 does not allow certain master roles to be on multiple hosts of the same instance group. So if you create two master nodes in one group, by default, all the master roles are being assigned to the two master nodes.

 

If you want to have a workaround, just create two master  groups and make each group contain one master node. 

 

If the reason of setting up two master nodes is for high availability, the following is the Director 2.1 HA doc link:

http://www.cloudera.com/documentation/director/latest/topics/director_create_ha_clusters.html#concep...

 

You can use https://github.com/cloudera/director-scripts/blob/master/configs/aws.ha.reference.conf as a reference to set up a HA cluster.