Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

change from embedded to external database unclear

avatar
Contributor

Hello, all. Recently, I installed Cloudera Express to create a new cluster for some of our students. It seems in this version, there's a warning message on the Cloudera Manager:

 

You are running Cloudera Manager in non-production mode, which uses an embedded PostgreSQL database. Switch to using a supported external database before moving into production. More Details

 

I don't recall that from earlier versions of CM's website. Nevertheless, I took this opportunity to install external databases as described in the link referenced by More Details (which was http://tiny.cloudera.com/cm_db_config-5 )

 

However, despite my having created these databases and reconfigured the relevant configuration files (and confirming them through the Configuration/Database settings tabs, I'm still getting the warning message. Now, I realize I can also turn OFF that message in Configuration/General, but my question is: How do I know that my Cloudera Manager is actually running off the external databases? (I said to use external databases from the very start in the Wizard when running the CM website for the first time). I note that all three services

[ + ] cloudera-scm-agent
[ + ] cloudera-scm-server
[ + ] cloudera-scm-server-db

are running on the server which has the CM website, databases, etc.  I cannot stop the server-db daemon, it simply hangs till I quit, and then shows that it is still running.

 

Am I missing something? What else can I look at? Am I supposed to manually turn off that message? What is triggering it in the first place, if the Cloudera Express was installed with external databases to begin with?

 

Thanks,

Cindy

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello Cindy,

 

An external database is one that is not on the same host as Cloudera Manager. We check additional parameters in our code to verify that the database is in fact non-local or not the embeded ones we install through deployment Path A. I included the conditions we presently look for in that response to you previously. Database migrations are complex, if you want to avoid this problem altogether you should follow deployment Path B especially for production environments where you will need to setup the RDBMS on your own externally.

 

Selecting external database and pointing at the same local database istances will not make your database an external one. The scm_prepare scripts you are pointing at take a variety of options. External databases will generally not use localhost because they are RDBM systems located on another host.

 

Generally speaking placing your database structures on application servers can be bad outside of POC conditions. If you are a licensed customer and need help please open a case with us.

 

 

---
Customer Operations Engineer | Security SME | Cloudera, Inc.

View solution in original post

11 REPLIES 11

avatar
Master Guru

Hi @Cindy,

 

There is a lot packed into your post; I'll do my best to hit the main points:

 

(1)

 

We added a banner in Cloudera Manager 5.9 to warn about using the embedded db in production.  Previously, there was no warning

 

(2)

 

The https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_installing_configuring_dbs.htm... link only describes creating a new external Cloudera Manager Database.  It does not describe how to migrate the data over.

At this time, Cloudera does not publish documentation regarding migration of Cloudera Manager databases.

 

(3)

 

Without knowing exactly what was configured, it is hard to say what happened, but I think that you haven't configured Cloudera manager to use the database.  See this section on how to do so:

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_installing_configuring_dbs.htm...

 

That script, when run, should update your /etc/cloudera-scm-server/db.properties file (which tells Cloudera Manager what database to use.)

 

(4)

 

In order to stop "cloudera-scm-server-db" you need to have all connections to the databases it hosts closed.  Running a "stop" will signal Postgres to not allow new connections and to wait till clients have closed their connections to shut down.  If you have other services using the embedded database, this may never happen.

 

To force a close of all open connections and a shut down:

 

# service cloudera-scm-server-db next_stop_fast

# service cloudera-scm-server-db stop

 

(5)

 

The "embbeded" banner is displayed if the following shows that "embeddedDbused" is true:

 

<cm_host>:<cm_port>/api/v14/cm/scmDbInfo

 

****WARNING****

 

make sure you have a backup of your current embedded db (located by default in /var/lib/cloudera-scm-server-db) before trying any migration.  If you use a new db, the Cloudera Manager database will be blank and you will be asked to create a cluster again.

 

I hope some of that helps; let us know if you have other questions or need clarification.

 

 

avatar
Contributor

Oh thank you. That does help clarify a number of things. 

 

On (2), when I was installing the CM (via the website wizard which steps through each part, including whether or not to use embedded or external databases) I chose external, which is why I find/found the banner so confusing.

 

As for point (3) I did not run that script in my most recent install (yes, I did this a couple times, starting over once or twice with a fresh OS install each time). But right now, this is what the file says:

root@mymachine:~# more /etc/cloudera-scm-server/db.properties
# Auto-generated by scm_prepare_database.sh on Wed Feb 8 18:45:16 PST 2017
#
# For information describing how to configure the Cloudera Manager Server
# to connect to databases, see the "Cloudera Manager Installation Guide."
#
com.cloudera.cmf.db.type=postgresql
com.cloudera.cmf.db.host=localhost
com.cloudera.cmf.db.name=scm
com.cloudera.cmf.db.user=scm
com.cloudera.cmf.db.setupType=EXTERNAL
com.cloudera.cmf.db.password=***
root@mymachine:~#

 

However with regard to your note in (5) I do note that  the .../api/v14/cm/scmDbInfo location 

gives me

{
  "scmDbType" : "POSTGRESQL",
  "embeddedDbUsed" : true
}

 

Hm, now I'm not sure what to think. It seems like something is inconsistent here. Any suggestions?

 

Thanks again.

avatar
Expert Contributor

Hello.

 

Despite what is in your configuration your database is not classified as an external one. While the db.setupType parameter shows a value of EXTERNAL, your database is located on the local host  and is there for not really external. External Databases are RDBM systems which do not technically reside on the same host as the service. We recommend that production deployments use externally available databases.

 


@Cindy wrote:
root@mymachine:~# more /etc/cloudera-scm-server/db.properties

# Auto-generated by scm_prepare_database.sh on Wed Feb 8 18:45:16 PST 2017
#
# For information describing how to configure the Cloudera Manager Server
# to connect to databases, see the "Cloudera Manager Installation Guide."
#
com.cloudera.cmf.db.type=postgresql
com.cloudera.cmf.db.host=localhost
com.cloudera.cmf.db.name=scm
com.cloudera.cmf.db.user=scm
com.cloudera.cmf.db.setupType=EXTERNAL
com.cloudera.cmf.db.password=***


 

 

Reviewing the code base I can see that the following conditions will result in the database being identified as embeded.

 

// CM is using embedded db, if:
// 1. db type is Postgresql
// 2. db host is 'localhost'
// 3. db port is 7432
// 4. db name is 'scm'
// 5. db user is 'scm'

---
Customer Operations Engineer | Security SME | Cloudera, Inc.

avatar
Contributor

Then how do I change it from embedded to external? I followed the directions given in the warning message *exactly*.

 

For some update on my situation, as a matter of fact, when I wound up rebooting the server this was on (for other reasons), the discrepancy corrupted the entire cluster installation and I had to reinstall everything. I have left it embedded this time around because honestly with 10 servers in the cluster, this is not something that will overwhelm the database.

 

But it seems to me if you are going to have this big blinking warning thing and recommendations to switch over to external databases, that set of instructions needs to actually work. What did I miss the first time around that left things in a half and half state? All the configuration on the CM Web  UI had all external databases set up, that tested correctly using the testing buttons, and so on. That apparen tly did not agree with the db.properties file and the ../api/v14/cm/scmDbInfo page info. So what was the missing steps from the procedure outlined in http://tiny.cloudera.com/cm_db_config-5 ?

avatar
Expert Contributor

Hello Cindy,

 

An external database is one that is not on the same host as Cloudera Manager. We check additional parameters in our code to verify that the database is in fact non-local or not the embeded ones we install through deployment Path A. I included the conditions we presently look for in that response to you previously. Database migrations are complex, if you want to avoid this problem altogether you should follow deployment Path B especially for production environments where you will need to setup the RDBMS on your own externally.

 

Selecting external database and pointing at the same local database istances will not make your database an external one. The scm_prepare scripts you are pointing at take a variety of options. External databases will generally not use localhost because they are RDBM systems located on another host.

 

Generally speaking placing your database structures on application servers can be bad outside of POC conditions. If you are a licensed customer and need help please open a case with us.

 

 

---
Customer Operations Engineer | Security SME | Cloudera, Inc.

avatar
Expert Contributor

First, thanks for the helpful detailed explanation.  We have a similar issue of migrating from default embedded DB to a separate PostgreSQL instance.

 

Some comments:

 

  1. The documentation needs to be clearer - the criteria for determining "embeddedness" you listed is not intuitive and could not have been inferred from the documentation.  Your writeup should have been included right there.
  2. The embeddedness criteria seem over-strict.  Insisting the DB be off-cluster is based on the old 3-tier architecture assumption - on the other hand, the Hadoop architectural principle is about co-hosting data and software.  On the practical side, basing such a central component off-cluster just seems needlessly inefficient and difficult to manage.  Can't the best practice be to use one dedicated node for CM, CMS, and DB?  Can Cloudera provide some guidelines? 
  3. For production use, the external DB option requires too many manual steps across multiple services.  Can Cloudera Manager provide more central admin and integration?  Including transparent migration from embedded DB.  This again requires the DB node to be part of the cluster under CM management.

 

Thanks,

Miles Yao

avatar
Contributor

Completely agreed. I had no idea "external db" was meant to be on a server *external to the cluster*. I was doing exactly what Miles described: a single cluster node that is reserved for CM/DB & Namenode management, with all the other nodes as workhorse datanodes (and zookeeper, yarn, etc).  I have reverted to embedded and turned off the warning message for my part, but future users would certainly benefit from that page (especially since it is linked directly from the warning message) to be a lot more detailed.

 

Thanks,

Cindy

avatar
Super Collaborator

Hi, this is an old post but just got into it.

I don't agree with the explanation for EXTERNAL database.

According to the initial db.properties file (I am copying it from the file

# The db setup type
# By default, it is set to INIT
# If scm-server uses Embedded DB then it is set to EMBEDDED
# If scm-server uses External DB then it is set to EXTERNAL
com.cloudera.cmf.db.setupType=INIT

Although is a good recomendation to have DB on another host, in that case external means other than embedded (cloudera-scm-server-db). And yes, when DB (e.g. MySQL) resides on the same host and "com.cloudera.cmf.db.host=localhost", the scm-prepare script sets "setupType" to external. And it works fine

avatar
Master Guru

@GeKas,

 

You are correct.

 

Thank you for clarifying that EXTERNAL means NOT EMBEDDED.

An external database server can be on the same host as Cloudera Manager.

 

-Ben