Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Bootstrap fails with "Insufficient number of instances available in time 20 MINUTES"

avatar
Explorer

The bootstrap fails with "Insufficient number of instances available in time 20 MINUTES" even though all the requested instances and their EBS volumes are provisioned. I'm running Director 2.2.

 

[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.l.p.c.PluggableComputeClusterTemplateValidator: Validating virtual instances of cluster Spark-DataScience
[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.l.p.c.PluggableComputeInstanceTemplateValidator: Validating instance template for compute provider: aws
[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1'
[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1'
[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1'
[2017-07-26 13:42:12] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing AMI 'ami-08bf131e'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing subnet 'subnet-533c820a'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing security group 'sg-cdeabeb0'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing key pair
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.l.p.c.PluggableComputeInstanceTemplateValidator: Validating instance template for compute provider: aws
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing AMI 'ami-08bf131e'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing subnet 'subnet-533c820a'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing security group 'sg-cdeabeb0'
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing key pair
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.l.m.m.p.ClouderaManagerMetadata: No repository specified, using metadata for default Cloudera Manager version
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.l.b.v.GenericClusterTemplateValidator: No product version metadata available for CDH:5. Using current version metadata instead.
[2017-07-26 13:42:13] INFO  [qtp614855935-17] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603' with root job com.cloudera.launchpad.api.jobs.DefaultBootstrapClus
terJob and listener com.cloudera.launchpad.api.listeners.pipeline.BootstrapClusterListener
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.d.ClusterRepositoryService: Cluster 'Spark-DataScience': BOOTSTRAPPING -> BOOTSTRAPPING
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> DefaultBootstrapClusterJob/4 [Environment{name='CapOne - Dev2 - CDH59 Environment', provider=InstanceProviderConfig
{type='aws'},  ...
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26294, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> SetStatusJob/1 [Requesting 7 instance(s) in 2 group(s)]
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.launchpad.pipeline.AbstractJob: Requesting 7 instance(s) in 2 group(s)
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << None{}
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> ParallelForEachInBatches/4 [20, class com.cloudera.launchpad.bootstrap.AllocateInstances, [VirtualInstanceGroup{nam
e='masters', ...
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.p.u.ParallelForEachInBatches: Generating batch for job class com.cloudera.launchpad.bootstrap.AllocateInstances of size 2
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26299, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> UnboundedParallelForEach/3 [class com.cloudera.launchpad.bootstrap.AllocateInstances, [VirtualInstanceGroup{name='m
asters', vir ...
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-93e490b7-6e18-4981-a9c5-7ee8105e67cc' with root job c
om.cloudera.launchpad.bootstrap.AllocateInstances and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener
[2017-07-26 13:42:14] INFO  [pipeline-thread-4] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-42ce02b3-727b-4df5-a15d-3f99305788d5' with root job com.cloudera.launchpad.bootstrap.AllocateInstances and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener
[2017-07-26 13:42:15] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26310, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> UnboundedWaitForAllPipelines/1 [[f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-93e490b7-6e18-4981-a9c5-7ee8105e67cc, f2c751c1-48 ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances/2 [VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='dbd5b101-667f-46db-956e- ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances/2 [VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='9a8baa82-4194-4867-9023- ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26319, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26320, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$AllocateAndWaitForInstancesToRun/2 [VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='9a8baa82-4194-4867-9023- ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$AllocateAndWaitForInstancesToRun/2 [VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='dbd5b101-667f-46db-956e- ...
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: Allocating 6 instances (min count 1) in group workers
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Allocating 1 instances (min count 1) in group masters
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Requesting 1 instances for com.cloudera.director.aws.ec2.EC2InstanceTemplate@1a5d2935
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Building instance requests
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Network interface specification: {DeviceIndex: 0,SubnetId: subnet-533c820a,Groups: [sg-cdeabeb0],DeleteOnTermination: true,PrivateIpAddresses: [],AssociatePublicIpAddress: false}
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Original image block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,Encrypted: false},}]
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 75,DeleteOnTermination: true,VolumeType: gp2,},}]
[2017-07-26 13:42:15] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Instance request type: m4.large, image: ami-08bf131e, group size: 1
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Requesting 6 instances for com.cloudera.director.aws.ec2.EC2InstanceTemplate@2fc40cf8
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Building instance requests
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Network interface specification: {DeviceIndex: 0,SubnetId: subnet-533c820a,Groups: [sg-cdeabeb0],DeleteOnTermination: true,PrivateIpAddresses: [],AssociatePublicIpAddress: false}
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Original image block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,Encrypted: false},}]
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: EBS volumes will be allocated as part of instance launch request
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,},}, {DeviceName: /dev/sdf,Ebs: {VolumeSize: 1792,DeleteOnTermination: true,VolumeType: st1,Encrypted: false},}]
[2017-07-26 13:42:15] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Instance request type: m4.2xlarge, image: ami-08bf131e, group size: 6
[2017-07-26 13:42:16] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Reservation r-0519ae93f01066f5f with Instance{id=i-0e91d6d581c37c4b5 privateIp=10.16.113.60}
[2017-07-26 13:42:16] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Tagging instance i-0e91d6d581c37c4b5 / dbd5b101-667f-46db-956e-0bd87431cbfa
[2017-07-26 13:42:16] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: << Reservation r-010b897d2842fb7c3 with Instance{id=i-00fc424904bfbab18 privateIp=10.16.113.157} Instance{id=i-051ee7afc61f6bee5 privateIp=10.16.113.79} Instance{id=i-065cdae4186725920 privateIp=10.16.113.167} Instance{id=i-02273a6e15782f7f3 privateIp=10.16.113.73} Instance{id=i-0db34bbdc9ecf8b7a privateIp=10.16.113.207} Instance{id=i-0fa73ede05ac38a26 privateIp=10.16.113.210}
[2017-07-26 13:42:16] INFO  [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Tagging instance i-00fc424904bfbab18 / 81b0bb5b-364d-4832-888e-bc1581b1ef68
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Instance i-0e91d6d581c37c4b5 got IP 10.16.113.60
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Waiting for 0 instances to start running
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.p.c.PluggableComputeProvider: Waiting for 0 instances to be running
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26327, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ...
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances/4 [Environment{name='CapOne - Dev2 - CDH59 Environment', provider=InstanceProviderConfig{type='aws'},  ...
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: All requested instances failed.
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Minimum number of instances (1) not available. Terminating available instances (0) as well.
[2017-07-26 13:42:31] ERROR [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Insufficient number of instances available in time 20 MINUTES

<snip>

[2017-07-26 13:42:35] INFO  [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: All requested instances are available
[2017-07-26 13:42:35] INFO  [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: Sufficient number of instances available (6/6)
1 ACCEPTED SOLUTION

avatar
Contributor

It means that director failed to allocate instances required. There should be exceptions logged in application.log why the allocation was failed. You might be able to tell the reason from CloudTrail as well.

View solution in original post

4 REPLIES 4

avatar
Contributor

It means that director failed to allocate instances required. There should be exceptions logged in application.log why the allocation was failed. You might be able to tell the reason from CloudTrail as well.

avatar
Explorer
I didn't see anything in application log that seemed helpful. I completely forgot about CloudTrail, I'll see if there's something there. I did finally deduce that it was having problems reserving an m4.large instance. My workers of m4.2xlarge were always getting provisioned, so I cut back to one master as an m4.2xl (its a temp cluster for testing).

avatar
Contributor

Sorry I apologize for the previous comment.

In log:

[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Instance i-0e91d6d581c37c4b5 got IP 10.16.113.60
[2017-07-26 13:42:31] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Waiting for 0 instances to start running

It means that instance was successfully created(private ip address assigned), but a later describe failed to find the instance because of eventual consistency of EC2. Retry with the same conf should eventually succeed. To avoid failing the pipeline, you can set the minimum count to 0, and choose retry in modify cluster if the same problem happens again.

 

Director is well aware of the problem, and is working uopn the it.

avatar
New Contributor

Hello

 

I am having the same issue,  I get the 3 instances provisioned but I get

 

module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO  [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances/4 [Environment{name='C5-Reference-AWS Environment', provider=InstanceProviderConfig{type='aws'}, crede ...
module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: All requested instances failed.
module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO  [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Minimum number of instances (1) not available. Terminating available instances (0) as well.
module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:56] ERROR [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
module.cdh.aws_instance.cdh_cl-instance (remote-exec): com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Insufficient number of instances available in time 20 MINUTES
module.cdh.aws_instance.cdh_cl-instance (remote-exec): 	at com.cloudera.launchpad.bootstrap.AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances.run(AllocateInstances.java:294) ~[launchpad-bootstrap-1.5.0.jar!/:1.5.0]
module.cdh.aws_instance.cdh_cl-instance (remote-exec): 	at com.cloudera.launchpad.bootstrap.AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances.run(AllocateInstances.java:253) ~[launchpad-bootstrap-1.5.0.jar!/:1.5.0]

 

I set minCount to 1 as with 0 threw me an error as well, I am attempting to do this via a script, if theres no fix yet could you please tell me how to retry via the .conf file as I havent succeded.