I've been trying to deploy an all-in-one HDP machine using Ambari on AWS similar to the sandbox using the latest version of Ambari and HDP. I've found that if I use a large instance size on AWS (8 Gb of ram and 2VCPU/CORES) the Ambari-server crashes every time during deployment, breaking the installation, requiring a total rebuild.
However, using an XL instance, with 16GB of RAM and 4 cores/vCPU the installation works fine every time and I get no errors. I have tested this 3 times, doing installations on both servers side-by-side (XL vs Large instances).
* The services installed on the single node are: HDFS, YARN + MapReduce2, Hive, HBase, Pig, Sqoop, Oozie, ZooKeeper, Flume, Ambair Infra, Ambari Metrics, Kafka, SmartSense, Spark and Spark2.
What is strange about this is that the deployment is that I can't find any reference to the hardware requirements anywhere and you can run most of these services in the sandbox with less memory and cores. Horton works and most big data trainers drill into you that Hadoop will 'run on all commodity hardware' but these hardware constraints seem to suggest otherwise.
Can anyone shed any light on this?
In sandbox you will find that many services are in Maintenance Mode. You do not need to run all the services at once because it will consume a lots of resources. You can stop the service that you do not want to run and then put them in maintenance mode. (For example when you want to do POC on HBase and not on Hive , you can stop Hive service and put it into maintenance mode)
(OR) the services and components that you are installing on Single Host should be allotted some less memory (heap..etc). It also depend of the kind of job that you are planning to run.
Also keep checking the memory used and free on yoru single host where you are installing almost every service.
# free -m
Interesting - so basically it sounds like the Ambari installer is not actually inspecting the memory available on the machines in the cluster (in my case, just a single node) and adjusting the default parameters for each service accordingly.
Because of this, the defaults it provides for things like heap size cause the ambari-server to crash and the installation fails before it finishes. I was hoping Ambari was more aware of its available hardware resources.
This is the second case in which I've seen the ambari-server break itself because of poor automation. If you install the Ambari-server using maria-db from say, the Red Hat repository and then install a service which forces the installation of mysql (such as Hive or another service), it will force install mysql community, overriding maria-db, removing its own database.
Also during the automated installation, I don't believe its possible to force a service into maintenance mode automatically after it is installed to save on resources. This would be a good option/feature if you don't know which services you need yet and you just want a basic, simple cluster.
Thanks for the link - nice read.