Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CM's Metadata Data Store (Postgres)

avatar
Contributor

All,

Your opinion on this topic will be greatly appreciated.

I understand the default RDBMS for CM's metadata data store is PostgresDB.

Because my firm's strategic RDBMS list doesn't include Postgres I am forced to deploy another relational database.

<Q1> How often CM talks to its backend metadata store?

<Q2> Is it possible to quantify the amount of data traffic from the data store to the apps?

<Q3> Should (or must) the database exist into the same server as CM?? (In my view YES, but I am looking for strong justifications!!!)

 

The same questions hold for CDH services (Hive, Oozie, Hue) where I would like to have MySQL instead of a 'strategic' RDBMS!!!

 

2 ACCEPTED SOLUTIONS

avatar
Master Collaborator

Our documentation clearly states we support MySQL, PostGres and Oracle for the majority of the runtime databases used by the platform.

 

Please see the following discussion:

 

For CM database requirements:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_cm_requirements.h...

 

For CDH database requirements:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_ve...

View solution in original post

avatar
Super Collaborator

Hello TS,

 

MySQL is a perfectly great choice to be the metadata store for the entities you mention. CM doesn't have a "default RDBMS" per se, but certain installation methods can pull postgres in for you. It's perfectly fine to elect to use MySQL instead, and I'd encourage it (as well as guiding you toward our documentation which you can cite to your firm showing that it's fine to use [1]).

 

RDBMS choice aside, the most important consideration is making sure that you have planned for and allocated sufficient space (or the ability to easily grow the available space) for the entities that will use the RDBMS. That's the absolute key. Some people love PostgresQL, others are very savvy with MySQL. Yet others may have a mandate to use Oracle 11g in an environment. Great - Cloudera Manager and CDH support any of these options!

 

As for your questions:

<Q1> How often CM talks to its backend metadata store?

 

A1 - Cloudera Manager remains in constant contact with its metadata store. 


<Q2> Is it possible to quantify the amount of data traffic from the data store to the apps?

 

A2 - I've not done this recently, but it would be an interesting exercise. Moreso than just the Cloudera Manager Server though, a few other of the Cloudera Management Services use an RDBMS (which may be considered for placement on the same instance as the one CM uses). 


<Q3> Should (or must) the database exist into the same server as CM?? (In my view YES, but I am looking for strong justifications!!!)

 

A3 - The database instance is not required to be located on the same node as Cloudera Manager Server, but if that's what makes sense in your deployment then it's fine to do so. Opting for a colocated database can, in some cases, remove network latency from the picture. But, if you have access to a dedicated database admin team that can deploy MySQL and manage it (while also making sure it is backed by reliable and fast storage), then it can also make more sense to use that rather than a non-dedicated disk that's local to the Cloudera Manager Server. Your circumstances will dictate what's best.

 

Refer to the document 'Storage Space Planning for Cloudera Manager', as it will also help you take note of the various services that use an RDBMS and some of the considerations you should take before deployment of same.

 

Regards,

--

Mark S.

 

 [1] - http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_cm_requirements.h...

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

Our documentation clearly states we support MySQL, PostGres and Oracle for the majority of the runtime databases used by the platform.

 

Please see the following discussion:

 

For CM database requirements:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_cm_requirements.h...

 

For CDH database requirements:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_ve...

avatar
Super Collaborator

Hello TS,

 

MySQL is a perfectly great choice to be the metadata store for the entities you mention. CM doesn't have a "default RDBMS" per se, but certain installation methods can pull postgres in for you. It's perfectly fine to elect to use MySQL instead, and I'd encourage it (as well as guiding you toward our documentation which you can cite to your firm showing that it's fine to use [1]).

 

RDBMS choice aside, the most important consideration is making sure that you have planned for and allocated sufficient space (or the ability to easily grow the available space) for the entities that will use the RDBMS. That's the absolute key. Some people love PostgresQL, others are very savvy with MySQL. Yet others may have a mandate to use Oracle 11g in an environment. Great - Cloudera Manager and CDH support any of these options!

 

As for your questions:

<Q1> How often CM talks to its backend metadata store?

 

A1 - Cloudera Manager remains in constant contact with its metadata store. 


<Q2> Is it possible to quantify the amount of data traffic from the data store to the apps?

 

A2 - I've not done this recently, but it would be an interesting exercise. Moreso than just the Cloudera Manager Server though, a few other of the Cloudera Management Services use an RDBMS (which may be considered for placement on the same instance as the one CM uses). 


<Q3> Should (or must) the database exist into the same server as CM?? (In my view YES, but I am looking for strong justifications!!!)

 

A3 - The database instance is not required to be located on the same node as Cloudera Manager Server, but if that's what makes sense in your deployment then it's fine to do so. Opting for a colocated database can, in some cases, remove network latency from the picture. But, if you have access to a dedicated database admin team that can deploy MySQL and manage it (while also making sure it is backed by reliable and fast storage), then it can also make more sense to use that rather than a non-dedicated disk that's local to the Cloudera Manager Server. Your circumstances will dictate what's best.

 

Refer to the document 'Storage Space Planning for Cloudera Manager', as it will also help you take note of the various services that use an RDBMS and some of the considerations you should take before deployment of same.

 

Regards,

--

Mark S.

 

 [1] - http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_cm_requirements.h...