Support Questions

Find answers, ask questions, and share your expertise

Cloudera director bootstrap failure: Cloudera Manager AGENT installation failure

avatar
Explorer

Hi,

I am new to Cloudera and am trying to use the Cloudera director to spin up an AWS cluster to test Apache Spot in accordance with the instructions given here -

 

https://blog.cloudera.com/blog/2018/02/apache-spot-incubating-and-cloudera-on-aws-in-60-minutes/

 

Having installed the Cloudera director according to the instructions mentioned above, the bootstrap command

"cloudera-director bootstrap spot-director.conf" fails to go through

 

Somewhere in the bootstrap process, the Cloudera Manager agent installation FAILS on various nodes in the cluster. From the logs, I am unable to understand why the installation is failing. After searching through similar questions posted in the forums, I verified that

(a) All the instances (director, manager and remaining cluster nodes) spun up have public internet connectivity, as well as connectivity to each other via their private IPs. To me, it looks like the Amazon VPC and subnet configurations have been set up correctly.

(b) 'screen' was installed and setup correctly on all the instances

All the instances were configured to run RHEL 7.5 (ami-18726478) 

 

The running output of the command "cloudera-director bootstrap spot-director.conf" before I terminated it..

Removing all local state ... done
Installing Cloudera Manager ...
* Starting ...... done
* Requesting an instance for Cloudera Manager ............................. done

------------

* Starting Cloudera Manager server .... done
* Waiting for Cloudera Manager server to start ..... done
* Changing admin Credentials for Cloudera Manager .... done
* Setting Cloudera Manager License ... done
* Enabling Enterprise Trial ... done
* Configuring Cloudera Manager ...... done
* Deploying Cloudera Manager agent ....... done
* Waiting for Cloudera Manager to deploy agent on 172.31.26.4 ..... done
* Setting up Cloudera Management Services .............. done
* Backing up Cloudera Manager Server configuration ........... done
* Inspecting capabilities of 172.31.26.4 ...... done
* Running Deployment post create scripts .... done
* Starting ... done
* Done ...
Cloudera Manager ready.
Creating cluster apache-spot ...
* Starting ........ done
* Requesting 5 instance(s) in 3 group(s) ............................ done
* Preparing instances in parallel (20 at a time) ............................................................................................................................... done
* Waiting for Cloudera Manager installation to complete .... done
* Installing Cloudera Manager agents on all instances in parallel (20 at a time) ........................................................................................................
.........................................................................................................................................................................................
.........................................................................................................................................................................................
............................................................................................................................................................. done
* Removing any failed instances before continuing ... done

 

The nodes in the cluster where installation failed don't seem to have any logs that I can look at. These nodes also get terminated by the bootstrap script.

 

Any help on how to debug/resolve the issue would be greatly appreciated!

 

Thanks

1 ACCEPTED SOLUTION

avatar
Explorer

Found the answer. It appears that some connectivity was broken betweenClouderaudera manager server and client. 

 

Once I relaxed the security group inbound rules to allow ALL traffic (and spun up the entire cluster using this security group), I no longer faced THIS particular problem.

View solution in original post

1 REPLY 1

avatar
Explorer

Found the answer. It appears that some connectivity was broken betweenClouderaudera manager server and client. 

 

Once I relaxed the security group inbound rules to allow ALL traffic (and spun up the entire cluster using this security group), I no longer faced THIS particular problem.