Support Questions

Find answers, ask questions, and share your expertise

Using amazon ec2 with cloud break and docker

avatar
Explorer

I am asking these question to get a overview :

1 Q) When using m4.4xlarge instance on amazon ec2 with Docker/Cloudbreak and deploying a max Hadoop HA blueprint does each master/slave section of the blueprint as defined will it take a individual m4.4xlarge(with quite few instances) or all will be in one m4.4xlarge instance by resource split in deployment?

2 Q)Using docker i believe we can spit the resources that a container can use out of a total resources on the underlying Linux OS and do a necessary installation right.? (Yes or no)

3 Q)And on amazon ec2 while doing HDP deployment using Cloudbreak

---step 1:) I believe we have to first install Cloudbreak in a small ec2 instance type (yes/no)

--- step2 🙂 and then in the GUI screens of Cloudbreak depending on the blueprint sections of master/slaves use the appropriate instance types(yes/no)

4 Q) Is there any complete documentation step by step guide or tutorial anybody can suggest that i can read and proceed to do a HDP installation using docker/cloudbreak on amazon ec2.

5) So in general depending on the hourly cost of instance types what are other typical costs that add up to deploy a HDP cluster on amazon ec2. Just as an example to get an idea are they any cost spread sheet available anywhere to see to deploy a minimum cluster on amazon ec2.

1 ACCEPTED SOLUTION

avatar
Rising Star

First of all the latest release which was using Docker on public clouds (AWS, GCP and Azure) was 1.2.3. The 1.3.0 or newer versions are not using Docker to run the Hadoop services. Anyway for 1.2.3 the answers are:

1. Containers were started with net=host, thus there was one container per VM - Docker was mostly used for packaging and distribution - thus every node had one container. You needed as many nodes as the size of the cluster was.

2. You can but the container was getting the full VM resources (see #1)

3. You need to install the Cloudbreak application (anywhere, that can be an EC2 instance for example but on-prem as well). The Cloudbreak application - note it's not the cluster - is composed of several micro-services, and these are running inside containers. Can be GUI or CLI or API - every hostgroup can have different instance types, the cluster can be heteregenous.

4. http://sequenceiq.com/cloudbreak-docs/

5. It depends on the number of nodes you'd like to provision. There are no additional costs on top of EC price thus yo ucan do a fairly easy math - multiply the number of nodes you think your cluster will have with the number of hours ... In Cloudbreak you can fully track usage costs on the Accounts tab.

View solution in original post

1 REPLY 1

avatar
Rising Star

First of all the latest release which was using Docker on public clouds (AWS, GCP and Azure) was 1.2.3. The 1.3.0 or newer versions are not using Docker to run the Hadoop services. Anyway for 1.2.3 the answers are:

1. Containers were started with net=host, thus there was one container per VM - Docker was mostly used for packaging and distribution - thus every node had one container. You needed as many nodes as the size of the cluster was.

2. You can but the container was getting the full VM resources (see #1)

3. You need to install the Cloudbreak application (anywhere, that can be an EC2 instance for example but on-prem as well). The Cloudbreak application - note it's not the cluster - is composed of several micro-services, and these are running inside containers. Can be GUI or CLI or API - every hostgroup can have different instance types, the cluster can be heteregenous.

4. http://sequenceiq.com/cloudbreak-docs/

5. It depends on the number of nodes you'd like to provision. There are no additional costs on top of EC price thus yo ucan do a fairly easy math - multiply the number of nodes you think your cluster will have with the number of hours ... In Cloudbreak you can fully track usage costs on the Accounts tab.