Support Questions
Find answers, ask questions, and share your expertise

Database Sizing and recommendation for Ambari and HDP components

What’s the approximate load on DB if we collocate ambari, oozie, hive, ranger admin & audit etc on the same DB cluster?

- Cluster Size ~ <100 nodes, 100-500 nodes, 500-1000 nodes and 1000+

- Number of Users for Hive and Oozie ~100+

1 ACCEPTED SOLUTION

Accepted Solutions

Contributor

We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement.

Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.

View solution in original post

8 REPLIES 8

Is the host only going to contain DBs, or will it also contain Ambari Server, HiveServer, etc?

Spreading it out is wise, but if you're really constrained, the Ambari DB is usually no more than 100 MB. There's an article on how to optimize the Ambari DB for large clusters (200+ nodes) http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/ch_tuning_...

@Alejandro FernandezThere is no guidance on Sizing database server i.e. CPUs, RAM required for Database Server for small to medium and large clusters.

@Pardeep It will be very difficult to forecast the load. Assumption: All connections are active them load can be low to medium

Mentor

The base DB's are usually small in size.The Ambari Server, HiveServer components should just be DB schemas a golden rules is always to have failover configured and tested and documented.

Don't forget non of these DB's is OLTP ,latency shouldn't be an issue so a cheap rackable 24 to 36 GB RAM and 8 CPU should be fine

I would be interested in planning for Hardware i.e. CPU and RAM for the database server(s). Disk requirement won't be high and not much concerning.

Contributor

We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement.

Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.

View solution in original post

@Andrea D'Orio Re: Ranger DB

The recommended approach is to go with Solr instead DB "for all the new deployments"

Mentor

Just configure the Nagios and Ganglia those 2 monitoring tools should give you some good metrics