Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Database Sizing and recommendation for Ambari and HDP components

avatar

What’s the approximate load on DB if we collocate ambari, oozie, hive, ranger admin & audit etc on the same DB cluster?

- Cluster Size ~ <100 nodes, 100-500 nodes, 500-1000 nodes and 1000+

- Number of Users for Hive and Oozie ~100+

1 ACCEPTED SOLUTION

avatar
Rising Star

We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement.

Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.

View solution in original post

8 REPLIES 8

avatar

Is the host only going to contain DBs, or will it also contain Ambari Server, HiveServer, etc?

Spreading it out is wise, but if you're really constrained, the Ambari DB is usually no more than 100 MB. There's an article on how to optimize the Ambari DB for large clusters (200+ nodes) http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/ch_tuning_...

avatar

@Alejandro FernandezThere is no guidance on Sizing database server i.e. CPUs, RAM required for Database Server for small to medium and large clusters.

avatar
Master Mentor

@Pardeep It will be very difficult to forecast the load. Assumption: All connections are active them load can be low to medium

avatar
Master Mentor

The base DB's are usually small in size.The Ambari Server, HiveServer components should just be DB schemas a golden rules is always to have failover configured and tested and documented.

Don't forget non of these DB's is OLTP ,latency shouldn't be an issue so a cheap rackable 24 to 36 GB RAM and 8 CPU should be fine

avatar

I would be interested in planning for Hardware i.e. CPU and RAM for the database server(s). Disk requirement won't be high and not much concerning.

avatar
Rising Star

We had some performance issue with a low profile config (4 vCores, 8 GB RAM), expecially with Oozie. Right now we reccomend at least 4 vCores and 24 GB or RAM). If you're planning a IaaS deployment on Azure using as metadata repository SQL Azure start directly with a S2/S3 instance: if you use Oozie it's the minimum requirement.

Pay also attention to Ranger: it's OK to Ranger admin and users but for audits you need to look carefully at DB sizes: it can easly grow up faster. Use a script to truncate the table or use a different instance.

avatar
Master Mentor

@Andrea D'Orio Re: Ranger DB

The recommended approach is to go with Solr instead DB "for all the new deployments"

avatar
Master Mentor

Just configure the Nagios and Ganglia those 2 monitoring tools should give you some good metrics