Support Questions

Find answers, ask questions, and share your expertise

Cluster start fails on firstRun

avatar
Contributor

I'm using Director's Rest API to start a cluster on an existing deployment. It runs fine until the firstRun call, at this step I see this error in the director logs:

[2017-06-29 09:53:38.054 +0000] INFO  [p-859d12b97f1e-DefaultBootstrapClusterJob] POST /api/v8/environments/dwh/deployments/manager_live/clusters com.cloudera.launchpad.bootstrap.cluster.firstrun.InvokeFirstRunClusterCommandV7 - c.c.l.pipeline.util.PipelineRunner: >> InvokeFirstRunClusterCommandV7/3 [Environment{name='dwh', provider=InstanceProviderConfig{type='aws'}, credentials=SshCred ...
[2017-06-29 09:53:38.094 +0000] INFO  [p-859d12b97f1e-DefaultBootstrapClusterJob] POST /api/v8/environments/dwh/deployments/manager_live/clusters com.cloudera.launchpad.bootstrap.cluster.firstrun.InvokeFirstRunClusterCommandV7 - c.c.launchpad.pipeline.AbstractJob: Calling firstRun on cluster dwh_live
[2017-06-29 09:53:38.142 +0000] ERROR [p-859d12b97f1e-DefaultBootstrapClusterJob] POST /api/v8/environments/dwh/deployments/manager_live/clusters com.cloudera.launchpad.bootstrap.cluster.firstrun.InvokeFirstRunClusterCommandV7 - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.api.ext.ClouderaManagerException: API call to Cloudera Manager failed. Method=ClustersResourceV7.firstRun. Response Status Code: 400. - Cause: javax.ws.rs.BadRequestException HTTP 400 Bad Request
	at com.cloudera.api.ext.ClouderaManagerClientProxy.invoke(ClouderaManagerClientProxy.java:137)
	at com.sun.proxy.$Proxy257.firstRun(Unknown Source)
...

I was looking for logs on the Manager, but could not find anything useful. Which log file should give me more insights?

The cluster would be fairly simple, with a master node, one worker and a gateway. The services use an external RDS database, the same that is used by the Manager.

What can be the issue here?

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi ztoth,

 

When Cloudera Manager returns a 400 error, often there are clues for what went wrong in Cloudera Manager's own logs (/var/log/cloudera-scm-server). If there's not much to go on there, then it's worth trying again, but configuring Cloudera Manager to emit debug information related to API calls. Pass this configuration property for Cloudera Manager in the deployment template:

 

enable_api_debug: true

 

There's an example for this in our reference configuration file for AWS:

 

https://github.com/cloudera/director-scripts/blob/master/configs/aws.reference.conf

 

This may or may not be the issue, but you should start a cluster with at least three workers, because each of them hosts an HDFS datanode, and you need at least three of those to meet the default HDFS replication factor.

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hi ztoth,

 

When Cloudera Manager returns a 400 error, often there are clues for what went wrong in Cloudera Manager's own logs (/var/log/cloudera-scm-server). If there's not much to go on there, then it's worth trying again, but configuring Cloudera Manager to emit debug information related to API calls. Pass this configuration property for Cloudera Manager in the deployment template:

 

enable_api_debug: true

 

There's an example for this in our reference configuration file for AWS:

 

https://github.com/cloudera/director-scripts/blob/master/configs/aws.reference.conf

 

This may or may not be the issue, but you should start a cluster with at least three workers, because each of them hosts an HDFS datanode, and you need at least three of those to meet the default HDFS replication factor.

avatar
Contributor

Hi Bill,

 

thanks for the tips, setting "enable_api_debug: true" helped identifying the issue. It seems that the Oozie service is a requirement for Hue - After I included Oozie, the creation ran successfully.