When does cloudbreak communicate with public IPs? I assume at some point it fetches the public repos. For example a security team does not want the process to communicate with any public IPs. What are the work arounds to handle this scenario on cloudbreak?
The next release will include a feature when you can deploy nodes into an existing subnet which not able to assign public ip-s to the machines. So the machines can reach the internet on a NAT gateway and you can login into the machines with VPN connection.
On the Cloudbreak side there are a two things which require internet connection:
- SSSD configuration
- Public recipes
So if you skip them Cloudbreak should works as well. Ambari does the others. In Cloudbreak you could configure HDP repository, so if you create a huge local repo which contains everything related to Ambari it should work. For more please ask Ambari team, because Ambari installs many things in runtime, for example updates and patches.
@rkovacs Where would the repos be loaded in advance? The only node which is static on cloudbreak is the deployer node. How would the instances launched by cloudbreak utilize repos which exist on the deployer node?
@Sunile Manjee Cloudbreak writes the configured repo next to Ambari if there is, and Ambari does the install on it's own way. We don't have repository on the deployer node (we don't have repos at all), you have to install yours or use public repo.
@rkovacs forgive me I am not following you. I don't see any option during a cluster deployment via cloudbreak to have ambari use local repos. Where may I find more (and clear) instructions on deploying a cluster via cloudbreak, configuring cluster to use local (not fetching repos) repos?
what does this mean: "Cloudbreak writes the configured repo next to Ambari if there is, and Ambari does the install on it's own way."
@rkovacs well there you go! Nice. so this solves 1 of 2. the second part is does cloudbreak reach out to any other public ips for any activity? basically if i create a VPC and harden the security group to have zero access to any public IP will the cluster launch and be usable? I guess I can test this as well.
Yes, Cloudbreak can install a cluster without internet connection if you set the local repo, skip SSSD, and live with limitation of recipes. But somehow you have to install Cloudbreak itself, which is impossible withoout internet connection. Easiest way to install and manage Cloudbreak installation is to use Cloudbreak deployer. But some of the commands are dowloading content, like init, upgrade, version, doctor. So first you have to install deployer with internet connection, for example you create an AMI which contains an already installed deployer and cloudbreak, and than in the secured network you start an instance of the created AMI.