Reply
Explorer
Posts: 22
Registered: ‎02-17-2015

Production deployment reuse external database?

Cloudera Director requires a pre-installed DB or an embedded H2.

In the config-files I can change the db to an external MySQL-database.

 

First question:

- Does Cloudera-Director support Postgres, is there a sample config?

 

The plan is to deploy db on separate node in cluster close to Cloudera Director and Cloudera Manager.

 

Second question:

- Should I reuse the external-db for ClouderaDirector, ClouderaManager, Hive, HUE and OOZIE.

 

Each service has its own database but is using the same MySQL / Postgres instance. This is easier to backup and make the DB high available. If only MySQL is supported by Cloudera Director, than the other services have to use that db-type as well.

Cloudera Employee
Posts: 76
Registered: ‎10-28-2014

Re: Production deployment reuse external database?

MrBee,

 

Cloudera Director does support using an existing Postgres database server. Please refer to the databaseServers section of the aws.reference.conf. You should find a section like this: 

 

databaseServers {

    #
    # Provision RDS database server
    #

    # rds-mysql-prod1 {
    #   type: mysql
    #   user: root
    #   password: rootpassword
    #   instanceClass: db.m3.medium
    #   dbSubnetGroupName: REPLACE-ME
    #   vpcSecurityGroupIds: sg-REPLACE-ME
    #   allocatedStorage: 10
    #   engineVersion: 5.5.40b
    #   tags {
    #     owner: ${?USER}
    #   }
    # }

    #
    # Use an existing MySQL server
    #
    #
    # existingmysql1 {
    #   type: mysql
    #   host: REPLACE-ME # with IP address of database server
    #   port: 3306
    #   user: root
    #   password: rootpassword
    # }
    #

    #
    # Use an existing PostgresSQL server
    #
    #
    # existingpostgres1 {
    #   type: postgresql
    #   host: REPLACE-ME # with IP address of database server
    #   port: 5432
    #   user: postgres
    #   password: rootpassword
    # }
    #

}

You can uncomment and fill in the "existingpostgres1" block to register this database server with Director for use by the Deployment and Cluster in this conf file. This database is not managed (i.e., won't be terminated) by Director. The name of the block will be used as the 'databaseServerName'.

 

The 'databaseServerName' fields in the following examples should be modified to match the block name from the above sample.

 

Refer to the aws.reference.conf for a sample of how to specify that database server for use by the Deployment

 

cloudera-manager {

...

    databaseTemplates {
        CLOUDERA_MANAGER {
            name: scmt
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: scm
            usernamePrefix: scmu
        }

        ACTIVITYMONITOR {
            name: amont
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: amon
            usernamePrefix: amonu
        }

        REPORTSMANAGER {
            name: rmant
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: rman
            usernamePrefix: rmanu
        }

        NAVIGATOR {
            name: navt
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: nav
            usernamePrefix: navu
        }

        # Available in Cloudera Manager 5.2+
        NAVIGATORMETASERVER {
            name: navmst
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: navms
            usernamePrefix: navmsu
        }
    }
...
}

and Cluster. The databaseTemplate for HUE and OOZIE may be specified similarly to HIVE and SENTRY.

cluster {

...

    #
    # 3. Optional configuration for creating external databases on the fly for Hive Metastore or Sentry
    # database
    #

    databaseTemplates: {
        HIVE {
            name: hivet
            databaseServerName: rds-mysql-prod1 # Must correspond to an external database server named above
            databaseNamePrefix: hive
            usernamePrefix: hiveu
        },
        SENTRY {
            name: sentryt
            databaseServerName: rds-mysql-prod1
            databaseNamePrefix: sentry
            usernamePrefix: sentryu
        }
    }
...
}

Also refer to the documentation for more information.

http://www.cloudera.com/documentation/director/latest/topics/director_external_db_using.html

 

David

 

Explorer
Posts: 22
Registered: ‎02-17-2015

Re: Production deployment reuse external database?

I understand that you can provision from Director to a Postgres-db:

 

Still I got second question:

Should I use a DB for ClouderaDirector which I reuse for other ClouderaManager + HUE and Oozie?

 

This is my first question:

If you check the appliciation.properties in /etc/cloudera-director-server/application.properties I don't see postgres.

Can you check whether Cloudera Director can be hosted on a Postgres DB?

 

#
# Configurations for database connectivity.
#

# Optional database type (h2 or mysql) (defaults to h2)
# lp.database.type: mysql
Highlighted
Cloudera Employee
Posts: 76
Registered: ‎10-28-2014

Re: Production deployment reuse external database?

MrBee,

 

Cloudera Director does not support Postgres.

See http://www.cloudera.com/documentation/director/latest/topics/director_deployment_requirements.html for details.

 

You can re-use the same DB for Director, CM, and CDH if you wish (if mysql or mariadb). I don't see any problems with this since databases will be created for each CM and CDH service. I think that the database name will be uniquely generated based on the name specified in the database templates, but you should double check that before bootstrapping a second CM or CDH cluster. Also be mindful of the lp.database.name field if you wish to run multiple Directors.

 

I don't feel that I'm qualified to answer whether you SHOULD re-use the same database for Director, Manager, and CDH as that is dictated by your own requirements.