Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Backend Database for Hadoop Services

Backend Database for Hadoop Services

Hi All,

 

Based on your experience/knowledge, can you suggest which database (Mysql or Oracle) should be used for databases of services like Cloudera Manager, Hive, sqoop, sentry, oozie....

 

I have experience with mysql but we plan to use Oracle due to certain advantages of maintaining it in our org.

 

It would be great if someone highlights the advantages/disadvantages of oracle?

 

 

4 REPLIES 4

Re: Backend Database for Hadoop Services

Contributor

Hello, I have this question too and if you don't mind, I'd like to add some other considerations.

 

I see that CDH services usually declares compatibility to Oracle, MySQL and Postgres. However, not all of them supports those three (Hue for instance), and looking closely only MySQL seems to be the one very cross-service compatible.

 

So I think that for now the best bet is on MySQL (I don't want Oracle, anyway). 

I am doing some research for a DB supporting HA. At last in my quest I found that there are two solutions to support HA for MySQL: Percona XtraDB Cluster and MariaDB Galera, where the first actually uses libraries from the latter and adds some other interesting things.

 

My question is: what is the position of Cloudera regarding backend DB in HA ?

Let me to say that there's not great support for this in documentation: there are guides to make HS2 and HMS read from a HA DB, but not that much considerations and best practices. My ultimate goal is to truly make HMS and HS2 HA, adding a HA backend DB with a load-balancer on top of it, so I can:

  • loadbalance accesses; 
  • obtain a Metastore in true HA;
  • migrate other services such as Cloudera Manager, Hive, Impala, etc to a real always-on state;
    • thus giving me the option to "hot-swap" services that are failing (for ex. making Hive respond even if one of the servers crush).

I know that Cloudera would probably not stand for one of them over the other, but I'd like to have some recommendations (maybe they are partners of Cloudera already) or there have been some tests in past.

 

I am interested in Percona: while Galera is in alpha state (though they says it is affordable), Percona offers support and reports some companies already using it in Production environments.

I am also interested in paying support.

 

Looking forward for your reply, thanks

 

Omar

Re: Backend Database for Hadoop Services

Thats a nice suggestion, lets see what Cloudera says. I anyways went ahead with MySql.

Re: Backend Database for Hadoop Services

New Contributor

Hi,

 

I too am working on a similar no single point of failure solution for the back end.

 

So far Galara Cluster with MySQL or MariaDB seem to be the one solution that will work. But it may not be as easy.

Take a look at this  OOZIE-2854

 

Steven

Re: Backend Database for Hadoop Services

Contributor
Kudos to you for pointing this out.

Fortunately the customers where I'd like to introduce this doesn't use Oozie, but it certainly is a problem.

What sounds really strange to me is: are we the only 3 persons that tried to figure this out ? Is everyone else just using a single DB instance, or are they all using Oracle ???