Created on 07-26-2017 07:00 AM - edited 09-16-2022 08:45 AM
The bootstrap fails with "Insufficient number of instances available in time 20 MINUTES" even though all the requested instances and their EBS volumes are provisioned. I'm running Director 2.2.
[2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.l.p.c.PluggableComputeClusterTemplateValidator: Validating virtual instances of cluster Spark-DataScience [2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.l.p.c.PluggableComputeInstanceTemplateValidator: Validating instance template for compute provider: aws [2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1' [2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1' [2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1' [2017-07-26 13:42:12] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing AMI 'ami-08bf131e' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing subnet 'subnet-533c820a' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing security group 'sg-cdeabeb0' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing key pair [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.l.p.c.PluggableComputeInstanceTemplateValidator: Validating instance template for compute provider: aws [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: >> Describing all regions to find endpoint for 'us-east-1' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: << Found endpoint 'ec2.us-east-1.amazonaws.com' for region 'us-east-1' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing AMI 'ami-08bf131e' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing subnet 'subnet-533c820a' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing security group 'sg-cdeabeb0' [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.d.a.e.EC2InstanceTemplateConfigurationValidator: >> Describing key pair [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.l.m.m.p.ClouderaManagerMetadata: No repository specified, using metadata for default Cloudera Manager version [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.l.b.v.GenericClusterTemplateValidator: No product version metadata available for CDH:5. Using current version metadata instead. [2017-07-26 13:42:13] INFO [qtp614855935-17] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603' with root job com.cloudera.launchpad.api.jobs.DefaultBootstrapClus terJob and listener com.cloudera.launchpad.api.listeners.pipeline.BootstrapClusterListener [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.d.ClusterRepositoryService: Cluster 'Spark-DataScience': BOOTSTRAPPING -> BOOTSTRAPPING [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> DefaultBootstrapClusterJob/4 [Environment{name='CapOne - Dev2 - CDH59 Environment', provider=InstanceProviderConfig {type='aws'}, ... [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26294, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> SetStatusJob/1 [Requesting 7 instance(s) in 2 group(s)] [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.launchpad.pipeline.AbstractJob: Requesting 7 instance(s) in 2 group(s) [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << None{} [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> ParallelForEachInBatches/4 [20, class com.cloudera.launchpad.bootstrap.AllocateInstances, [VirtualInstanceGroup{nam e='masters', ... [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.p.u.ParallelForEachInBatches: Generating batch for job class com.cloudera.launchpad.bootstrap.AllocateInstances of size 2 [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26299, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> UnboundedParallelForEach/3 [class com.cloudera.launchpad.bootstrap.AllocateInstances, [VirtualInstanceGroup{name='m asters', vir ... [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-93e490b7-6e18-4981-a9c5-7ee8105e67cc' with root job c om.cloudera.launchpad.bootstrap.AllocateInstances and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener [2017-07-26 13:42:14] INFO [pipeline-thread-4] - c.c.l.p.DatabasePipelineService: Starting pipeline 'f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-42ce02b3-727b-4df5-a15d-3f99305788d5' with root job com.cloudera.launchpad.bootstrap.AllocateInstances and listener com.cloudera.launchpad.pipeline.listener.NoopPipelineStageListener [2017-07-26 13:42:15] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26310, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:15] INFO [pipeline-thread-4] - c.c.l.pipeline.util.PipelineRunner: >> UnboundedWaitForAllPipelines/1 [[f2c751c1-48eb-443c-a9c3-e520cb9ce603/child-00000-93e490b7-6e18-4981-a9c5-7ee8105e67cc, f2c751c1-48 ... [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances/2 [VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='dbd5b101-667f-46db-956e- ... [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances/2 [VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='9a8baa82-4194-4867-9023- ... [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26319, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26320, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$AllocateAndWaitForInstancesToRun/2 [VirtualInstanceGroup{name='workers', virtualInstances=[VirtualInstance{id='9a8baa82-4194-4867-9023- ... [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$AllocateAndWaitForInstancesToRun/2 [VirtualInstanceGroup{name='masters', virtualInstances=[VirtualInstance{id='dbd5b101-667f-46db-956e- ... [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: Allocating 6 instances (min count 1) in group workers [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Allocating 1 instances (min count 1) in group masters [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Requesting 1 instances for com.cloudera.director.aws.ec2.EC2InstanceTemplate@1a5d2935 [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Building instance requests [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Network interface specification: {DeviceIndex: 0,SubnetId: subnet-533c820a,Groups: [sg-cdeabeb0],DeleteOnTermination: true,PrivateIpAddresses: [],AssociatePublicIpAddress: false} [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Original image block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,Encrypted: false},}] [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 75,DeleteOnTermination: true,VolumeType: gp2,},}] [2017-07-26 13:42:15] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Instance request type: m4.large, image: ami-08bf131e, group size: 1 [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: Found EC2 key name cd-poc for fingerprint [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Requesting 6 instances for com.cloudera.director.aws.ec2.EC2InstanceTemplate@2fc40cf8 [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Building instance requests [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Network interface specification: {DeviceIndex: 0,SubnetId: subnet-533c820a,Groups: [sg-cdeabeb0],DeleteOnTermination: true,PrivateIpAddresses: [],AssociatePublicIpAddress: false} [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Original image block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,Encrypted: false},}] [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: EBS volumes will be allocated as part of instance launch request [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Block device mappings: [{DeviceName: /dev/sda1,Ebs: {SnapshotId: snap-0c22e054999b5520f,VolumeSize: 50,DeleteOnTermination: true,VolumeType: gp2,},}, {DeviceName: /dev/sdf,Ebs: {VolumeSize: 1792,DeleteOnTermination: true,VolumeType: st1,Encrypted: false},}] [2017-07-26 13:42:15] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Instance request type: m4.2xlarge, image: ami-08bf131e, group size: 6 [2017-07-26 13:42:16] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Reservation r-0519ae93f01066f5f with Instance{id=i-0e91d6d581c37c4b5 privateIp=10.16.113.60} [2017-07-26 13:42:16] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: >> Tagging instance i-0e91d6d581c37c4b5 / dbd5b101-667f-46db-956e-0bd87431cbfa [2017-07-26 13:42:16] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: << Reservation r-010b897d2842fb7c3 with Instance{id=i-00fc424904bfbab18 privateIp=10.16.113.157} Instance{id=i-051ee7afc61f6bee5 privateIp=10.16.113.79} Instance{id=i-065cdae4186725920 privateIp=10.16.113.167} Instance{id=i-02273a6e15782f7f3 privateIp=10.16.113.73} Instance{id=i-0db34bbdc9ecf8b7a privateIp=10.16.113.207} Instance{id=i-0fa73ede05ac38a26 privateIp=10.16.113.210} [2017-07-26 13:42:16] INFO [pipeline-thread-6] - c.c.director.aws.ec2.EC2Provider: >> Tagging instance i-00fc424904bfbab18 / 81b0bb5b-364d-4832-888e-bc1581b1ef68 [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Instance i-0e91d6d581c37c4b5 got IP 10.16.113.60 [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Waiting for 0 instances to start running [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.p.c.PluggableComputeProvider: Waiting for 0 instances to be running [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: << DatabaseValue{delegate=PersistentValueEntity{id=26327, pipeline=f2c751c1-48eb-443c-a9c3-e520cb9ce603 ... [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances/4 [Environment{name='CapOne - Dev2 - CDH59 Environment', provider=InstanceProviderConfig{type='aws'}, ... [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: All requested instances failed. [2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Minimum number of instances (1) not available. Terminating available instances (0) as well. [2017-07-26 13:42:31] ERROR [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Insufficient number of instances available in time 20 MINUTES <snip> [2017-07-26 13:42:35] INFO [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: All requested instances are available [2017-07-26 13:42:35] INFO [pipeline-thread-6] - c.c.l.bootstrap.AllocateInstances: Sufficient number of instances available (6/6)
Created 07-26-2017 11:41 AM
It means that director failed to allocate instances required. There should be exceptions logged in application.log why the allocation was failed. You might be able to tell the reason from CloudTrail as well.
Created 07-26-2017 11:41 AM
It means that director failed to allocate instances required. There should be exceptions logged in application.log why the allocation was failed. You might be able to tell the reason from CloudTrail as well.
Created 07-26-2017 11:53 AM
Created 07-27-2017 02:10 PM
Sorry I apologize for the previous comment.
In log:
[2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.director.aws.ec2.EC2Provider: << Instance i-0e91d6d581c37c4b5 got IP 10.16.113.60
[2017-07-26 13:42:31] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Waiting for 0 instances to start running
It means that instance was successfully created(private ip address assigned), but a later describe failed to find the instance because of eventual consistency of EC2. Retry with the same conf should eventually succeed. To avoid failing the pipeline, you can set the minimum count to 0, and choose retry in modify cluster if the same problem happens again.
Director is well aware of the problem, and is working uopn the it.
Created 12-14-2017 01:07 AM
Hello
I am having the same issue, I get the 3 instances provisioned but I get
module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: >> AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances/4 [Environment{name='C5-Reference-AWS Environment', provider=InstanceProviderConfig{type='aws'}, crede ... module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: All requested instances failed. module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:55] INFO [pipeline-thread-5] - c.c.l.bootstrap.AllocateInstances: Minimum number of instances (1) not available. Terminating available instances (0) as well. module.cdh.aws_instance.cdh_cl-instance (remote-exec): [2017-12-14 03:42:56] ERROR [pipeline-thread-5] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed module.cdh.aws_instance.cdh_cl-instance (remote-exec): com.cloudera.launchpad.pipeline.UnrecoverablePipelineError: Insufficient number of instances available in time 20 MINUTES module.cdh.aws_instance.cdh_cl-instance (remote-exec): at com.cloudera.launchpad.bootstrap.AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances.run(AllocateInstances.java:294) ~[launchpad-bootstrap-1.5.0.jar!/:1.5.0] module.cdh.aws_instance.cdh_cl-instance (remote-exec): at com.cloudera.launchpad.bootstrap.AllocateInstances$GetSuccessfulInstancesAndTerminateFailedInstances.run(AllocateInstances.java:253) ~[launchpad-bootstrap-1.5.0.jar!/:1.5.0]
I set minCount to 1 as with 0 threw me an error as well, I am attempting to do this via a script, if theres no fix yet could you please tell me how to retry via the .conf file as I havent succeded.