Created 07-07-2017 07:37 PM
I am trying to launch HDC cloud on AWS. I tried 3/4 times and it has been failing consistently.
"InstanceWaitCondition" : { "Type" : "AWS::CloudFormation::WaitCondition", "DependsOn" : "Cloudbreak", "Properties" : { "Handle" : { "Ref" : "InstanceWaitHandle" }, "Timeout" : 36000 } },
I am not sure how to debug the problem and find the root cause to fix the problem. Any input is highly appreciated.
Created 07-07-2017 07:38 PM
Attaching AWS console cloud formation - status page
Created 07-08-2017 08:44 PM
Do your networking settings allow the RDS to communicate with the cloud controller? See
There should be a more explicit error message if the connection to RDS was not possible to establish, but just making sure.
Which version of HDCloud are you using?
Created 07-11-2017 09:43 PM
-----> main entry point -----> restoring motd -----> retrieving metadata -----> retrieving region -----> get profile attribute from cfn metadata of logical resource: Cloudbreak -----> public ip is: ec2-34-212-149-137.us-west-2.compute.amazonaws.com -----> using hostname -----> using existing VPC: true -----> described internet gateway: igw-ee79d68b -----> fill Profile -----> wait for docker ... -----> check RDS-----> using RDS: hdc-test.cluster-czvrt6ojpbos.us-west-2.rds.amazonaws.com:3306 -----> checking RDS connectivity /var/lib/cloud/instance/scripts/part-001: line 287: 3: required -----> installation failed: ERROR: command 'declare host=${1:?required} port=${2:?required} user=${3:?required} password=${4:?required} dbname=${5:?required}' exited with status: 1 line: 1 Error signaling CloudFormation: [Errno None] ('Connection aborted.', gaierror(-3, 'Temporary failure in name resolution'))
Created 07-12-2017 06:09 PM
See step 4 in https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.16.0/bk_hdcloud-aws/content/advanc... in case these guidelines help.
I always (1) use the same VPC for RDS and cloud controller and (2) I initially set the Inbound rule on the security group to "0.0.0.0/0" just to avoid any connection errors (since we don't know the IP address of the cloud controller at the point when we create RDS and define the security group settings).
What do you think @Tamas Bihari ?
Created 07-13-2017 04:34 PM
From the last error I think @Dominika Bialek is right and the HDC Controller could not to connect to the RDS service due to network, security rule limitations.
On the other hand could you please give a try with the https://aws.amazon.com/marketplace/pp/B01LXOQBOU?qid=1499963967598&sr=0-2&ref_=srh_res_product_title templates. From the attached pdf and the last comment's logs it looks like you are still on the 1.14.4 version.
You could also checked from the controller that the RDS service is reachable by running the following commands.
Check the domain name:
nslookup hdc-test.cluster-czvrt6ojpbos.us-west-2.rds.amazonaws.com
Check the port is open on the specified machine:
telnet hdc-test.cluster-czvrt6ojpbos.us-west-2.rds.amazonaws.com 3306
Br, Tamas
Created 07-10-2017 07:35 AM
Could you please check what @Dominika Bialek has been recommended?
On the other hand if you create a new deployment through the Cloudformation wizzard please set the value of the Options -> Advanced -> "Roll back on failure" to false. Then the Cloudformation won't roll back the resources when something fails and you will be able to SSH to your instance and check the logs of the deployment in the folder "/var/lib/cloudbreak-deployment" by running "cbd logs". Please attach mentioned logs if you have created an HDC deployment with the mentioned additional configs and please also attach the result of the "cbd ps" command.
Thanks,
Tamas
Created 07-18-2017 01:19 AM
Thanks @Dominika Bialek and @Tamas Bihari for your feedback.
we had identified the root cause an it seems unable to resolve DNS name.
[root@ip-172-17-245-9 ~]# traceroute google.com google.com: Temporary failure in name resolution
[root@ip-172-17-245-9 ~]# traceroute google.com traceroute to google.com (216.58.193.78), 30 hops max, 60 byte packets 1 * * * 2 ec2-50-112-0-108.us-west-2.compute.amazonaws.com (50.112.0.108) 18.287 ms ec2-50-112-0-106.us-west-2.compute.amazonaws.com (50.112.0.106) 15.908 ms 15.802 ms
Created 07-18-2017 09:15 AM
Maybe the best solution to debug this issue if you create a deployment and set the Options -> Advanced -> Rollback on failure option to false. In this case the deployment could to be deleted manually on the Cloudformation service of AWS after the debug has been finished. This way you can check the applied CF template, the created events and resources at the Cloudformation service.
As I checked the referenced template we only use references for resources when create the Cloudformation template except the public route. The rule is dedicated to allow outgoing connections from the created cbd instance. But probably in your specific network setup that part is not working as we expected, so please check the route table and it's rules. I guess there should be rules that can block the outgoing connections or the Cloudformation reference to wrong gateway and route table in your VPC.
"VPC" : { "Type" : "AWS::EC2::VPC", "Properties" : { "CidrBlock" : "10.0.0.0/16", "EnableDnsSupport" : "true", "EnableDnsHostnames" : "true", "Tags" : [ { "Key" : "Application", "Value" : { "Ref" : "AWS::StackId" } } ] } }, "PublicSubnet" : { "Type" : "AWS::EC2::Subnet", "Properties" : { "MapPublicIpOnLaunch" : true, "VpcId" : { "Ref" : "VPC" }, "CidrBlock" : "10.0.0.0/24", "Tags" : [ { "Key" : "Application", "Value" : { "Ref" : "AWS::StackId" } } ] } }, "InternetGateway" : { "Type" : "AWS::EC2::InternetGateway", "Properties" : { "Tags" : [ { "Key" : "Application", "Value" : { "Ref" : "AWS::StackId" } } ] } }, "AttachGateway" : { "Type" : "AWS::EC2::VPCGatewayAttachment", "Properties" : { "VpcId" : { "Ref" : "VPC" }, "InternetGatewayId" : { "Ref" : "InternetGateway" } } }, "PublicRouteTable" : { "Type" : "AWS::EC2::RouteTable", "Properties" : { "VpcId" : { "Ref" : "VPC" }, "Tags" : [ { "Key" : "Application", "Value" : { "Ref" : "AWS::StackId" } } ] } }, "PublicRoute" : { "Type" : "AWS::EC2::Route", "DependsOn" : [ "PublicRouteTable", "AttachGateway" ], "Properties" : { "RouteTableId" : { "Ref" : "PublicRouteTable" }, "DestinationCidrBlock" : "0.0.0.0/0", "GatewayId" : { "Ref" : "InternetGateway" } } }, "PublicSubnetRouteTableAssociation" : { "Type" : "AWS::EC2::SubnetRouteTableAssociation", "Properties" : { "SubnetId" : { "Ref" : "PublicSubnet" }, "RouteTableId" : { "Ref" : "PublicRouteTable" } } },
Created 07-18-2017 09:52 PM
@Tamas Bihari How do I see the applied CF template? I turned off the rollback option to debug. Do you know the location of the template is stored on the instance?
And another point is our test account in AWS, VPC CIDR is 172.16.0.0/x. But in your example, above you have CIDR block address 10.0.0.0/16. Is that something causing the public gateway?
Created 07-28-2017 08:19 AM
For public gateway, it has to be added to the routing table. Check your routing tables and see if it exists (probably won't be created by the template if using an existing VPC). Please see this page in AWS documentation section "Enabling Internet Access": http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html
Other things that may help:
Egress only gateway: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/egress-only-internet-gateway.html
DNS resolution: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html
Created 07-19-2017 12:43 PM
@Anandha L Ranganathan You can find the CF template on the Cloudformation view if you select a stack and choose the template tab. Just to be sure: Did you try to launch instances in the same vpc / subnet as the instances launched by the cloudbreak? From these instances were you able to telnet to RDS instances? Were you able to ping external world and other aws instances in the same vpc / subnet?
Created 07-19-2017 03:32 PM
It looks like there is definitely a routing issue and/or a network ACL that needs to be added or changed. It may be good place to look at AWS support on why hosts from subnet to subnet are having trouble communicating. I’ve run into situations in AWS when manually setting up services to talk to EMR clusters. Once that is found, find out how to set this in your template for future use (I am not that familiar with configuring the templates yet).
Created 07-27-2017 10:33 PM
@Anandha L Ranganathan Were you able to resolve the issue? I see in the other post that you managed to get to the create cluster stage? What did you do to solve the problem?
Created 07-30-2017 02:31 AM
@Dominika Bialek Still I am unable to launch cloudbreak in Oregon region. I was able to successfully launch cloudbreak in Virginia. But our Virginia region doesn't have proper VPC and subnet setup. In Oregon region, we are multiple subnets (application, public, and mgmt ). I tried in all three subnets but failed to install cloudbreak. We had checked other systems in the region/ availability zone side by side and everything looks similar and we haven't found any differences between the instances launched by cloudbreak and our systems. I checked with our IT security team and everything looks good in the routing table, subnet, and other components. We couldn't figure out the problem.