To setup an Ambari managed HDP cluster, below services’ components require database for their respective metastore databases.
While a lab/sandbox environment can be setup with default databases for these components, using the same is strongly not recommended for Dev/QA/UAT/Production clusters. Due diligence and planning must be done to ensure that database selection is appropriate for enterprise standard production cluster.
Below are key areas to be taken into consideration while planning to select a database for Ambari and HDP components.
Refer below for supported databases for Ambari and different HDP components in current state.
The relational database that backs the Hive Metastore, Ambari Server, Oozie Server etc. itself should also be made highly available using best practices defined for the database system in use for HDP services and Ambari Server to be truly Highly Available and not to have database as single point of failure for the service.
Therefore, it is important to select relational database, which supports highly availability, and it should be discussed with in-house DBA when planning for a new database or to use an existing in-house database for HDP deployment.
Cost of licensing and support
HDP support subscription doesn’t cover any licensing and support for databases being used for Ambari Server and HDP components i.e. Hive Metastore and would incur additional licensing and support cost. Therefore, cost of licensing and support should be considered as an important factor for selecting appropriate database for Ambari Server and HDP stack.
Note: Contact in-house database team/database vendor for details on cost for licensing and support for databases.
Database maintenance and management
Database to be used for Ambari Server and HDP components would need maintenance and management which can be quite frequent/regular for database backup, HA setup and recovery etc. Therefore, while selecting a database for Ambari/HDP, it must be ensured that your organization has in-house skilled people/DBAs available to perform these activities.
It is not a good practice to use different relational databases for different components i.e. Postgres for Ambari, MySQL for Hive etc. to avoid complexity for management and maintenance of these different databases. It is recommended to pick relational database of your choice and use the same i.e. MySQL for all components or Postgres for all components and so on.