Member since
05-05-2016
6
Posts
1
Kudos Received
0
Solutions
08-18-2016
07:51 PM
First of all the latest release which was using Docker on public clouds (AWS, GCP and Azure) was 1.2.3. The 1.3.0 or newer versions are not using Docker to run the Hadoop services. Anyway for 1.2.3 the answers are: 1. Containers were started with net=host, thus there was one container per VM - Docker was mostly used for packaging and distribution - thus every node had one container. You needed as many nodes as the size of the cluster was.
2. You can but the container was getting the full VM resources (see #1)
3. You need to install the Cloudbreak application (anywhere, that can be an EC2 instance for example but on-prem as well). The Cloudbreak application - note it's not the cluster - is composed of several micro-services, and these are running inside containers.
Can be GUI or CLI or API - every hostgroup can have different instance types, the cluster can be heteregenous. 4. http://sequenceiq.com/cloudbreak-docs/ 5. It depends on the number of nodes you'd like to provision. There are no additional costs on top of EC price thus yo ucan do a fairly easy math - multiply the number of nodes you think your cluster will have with the number of hours ... In Cloudbreak you can fully track usage costs on the Accounts tab.
... View more
06-22-2016
06:37 AM
3 Kudos
@sirisha A Work-preserving ResourceManager restart ensures that applications continuously function during a ResourceManager restart with minimal impact to end-users. The overall concept is that the ResourceManager preserves application queue state in a pluggable state store, and reloads that state on restart. While the ResourceManager is down, ApplicationMasters and NodeManagers continuously poll the ResourceManager until it restarts. If you have automatic failover enabled true then this polling time will get reduced and your jobs will resume in short amount of time so I would suggest to have both the options true in the configuration. Hope this information helps.
... View more
06-21-2016
10:08 AM
High availability in flume is just a matter of agents configuration regardless if you're using Ambari or not. Here few links you can check:
https://flume.apache.org/FlumeUserGuide.html#flow-reliability-in-flume https://flume.apache.org/FlumeUserGuide.html#failover-sink-processor
... View more