Support Questions

Find answers, ask questions, and share your expertise

CDH5.1.3 Cloudera Manager restore on new server from files or postgresql database

avatar
Explorer

I've tried both restoring from scm postgresql database backup and from a file backup of config and postgres (database stopped for backup). Both fail complaining about missing column

 

Caused by: org.postgresql.util.PSQLException: ERROR: column dbrole0_.config_staleness_status does not exist

 

Has anyone successfully restored CM on a new server?

 

Thanks, John.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

It sounds most likely something's just out of sequence. I could see this working fine (and have performed it numerous times) as long as sequence is maintained.

 

So, 

  • Install CM on node1
  • Upgrade CM on node1
  • Run CM through all upgrade wizards which upgrades all hosts in cluster1 hearbeating to node1 CM Server.
  • Stop all agents
  • Stop CM Server
  • Take [pg_dump|mysqldump] of CM database
  • Copy [pg_dump|mysqldump] of CM database to node2
  • Reimport consistent database export into same version of RDBMS on node2. Ensure GRANT statements also apply.
  • scp /etc/cloudera-scm-server/db.properties from node1 to node2 (assuming you've set appropriate GRANTs to the db import on node2 using same name / pw but allowing new hostname node2) OR
  • alter/etc/cloudera-scm-server/db.properties on node2 to ensure correct user/pw/hostname are specified depending on how you've issued GRANTs / permissions.
  • service cloudera-scm-server start && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
  • Sanity-check that the server process comes up; once port :7180 is bound, attempt login on node2:7180.
  • With UI up, configuration should all be in place for all clusters/services/roles, but no agents hearbeating
  • Alter /etc/cloudera-scm-agent/config.ini on all managed hosts. Change the "server_host=node1" to "server_host=node2", save and apply across all nodes.
  • Restart all agents:   # for i in $(cat file_listing_all_managed_cluster_nodes); do ssh root@$i 'service cloudera-scm-agent restart'; done
  • At this time all agents should be heartbeating to node2 which contains an exact clone of the configuration from node1.

Given a consistent backup from this mythical node1, I don't see how a column can go missing unless some specific [pg_dump|mysqldump[ export options were applied, or some sort of import exclusion happened. What syntax did you use both on the export and the import side of things for the RDBMS? Also, which RDBMS are you using specifically? 

 

Thanks,

--

Mark Schnegelberger

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Are you restoring to the exact same version of Cloudera Manager on the new server, as the version from which the backup was taken?

avatar
Explorer

Yes, I am testing the backup procedure before deploying in live. Repeated numerous times and get the same error. I upgraded from 5.0.0 to 5.1.3 first then backed up the database.

avatar
Super Collaborator

It sounds most likely something's just out of sequence. I could see this working fine (and have performed it numerous times) as long as sequence is maintained.

 

So, 

  • Install CM on node1
  • Upgrade CM on node1
  • Run CM through all upgrade wizards which upgrades all hosts in cluster1 hearbeating to node1 CM Server.
  • Stop all agents
  • Stop CM Server
  • Take [pg_dump|mysqldump] of CM database
  • Copy [pg_dump|mysqldump] of CM database to node2
  • Reimport consistent database export into same version of RDBMS on node2. Ensure GRANT statements also apply.
  • scp /etc/cloudera-scm-server/db.properties from node1 to node2 (assuming you've set appropriate GRANTs to the db import on node2 using same name / pw but allowing new hostname node2) OR
  • alter/etc/cloudera-scm-server/db.properties on node2 to ensure correct user/pw/hostname are specified depending on how you've issued GRANTs / permissions.
  • service cloudera-scm-server start && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
  • Sanity-check that the server process comes up; once port :7180 is bound, attempt login on node2:7180.
  • With UI up, configuration should all be in place for all clusters/services/roles, but no agents hearbeating
  • Alter /etc/cloudera-scm-agent/config.ini on all managed hosts. Change the "server_host=node1" to "server_host=node2", save and apply across all nodes.
  • Restart all agents:   # for i in $(cat file_listing_all_managed_cluster_nodes); do ssh root@$i 'service cloudera-scm-agent restart'; done
  • At this time all agents should be heartbeating to node2 which contains an exact clone of the configuration from node1.

Given a consistent backup from this mythical node1, I don't see how a column can go missing unless some specific [pg_dump|mysqldump[ export options were applied, or some sort of import exclusion happened. What syntax did you use both on the export and the import side of things for the RDBMS? Also, which RDBMS are you using specifically? 

 

Thanks,

--

Mark Schnegelberger

avatar
Explorer

I didn't stop all agents just CM. I intially tried just restoring scm, then amon,nav,rman and csm but failed with same problem. Local postgresql database. It is odd a file restore also fails so all config and database will be consistent.

 

On cm1, tried with --clean and without (just will drop tables if exist)

 

export PGPASSWORD=passwordfromfile
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_amon.sql amon
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_nav.sql nav
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_rman.sql rman
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_scm.sql scm

 

On cm2 built automatically in Amazon opsworks so has a blank CM config.

 

sudo service cloudera-scm-server stop

dropdb -h localhost -p 7432 -U cloudera-scm 'scm'
createdb -h localhost -p 7432 -U cloudera-scm 'scm'

export PGPASSWORD=newpasswordfromfile
psql -h localhost -p 7432 -U cloudera-scm -d amon -f csm_dump_amon.sql
psql -h localhost -p 7432 -U cloudera-scm -d nav -f csm_dump_nav.sql
psql -h localhost -p 7432 -U cloudera-scm -d rman -f csm_dump_rman.sql
psql -h localhost -p 7432 -U cloudera-scm -d scm -f csm_dump_scm.sql

 

sudo service cloudera-scm-server start

 

Java runs for a minute then dies. /var/log/cloudera-scm-server/db.log

 

ERROR:  column dbrole0_.config_staleness_status does not exist at character 417
STATEMENT:  select dbrole0_.ROLE_ID as ROLE1_28_0_, dbrole0_.OPTIMISTIC_LOCK_VERSION as OPTIMIST2_28_0_, dbrole0_.NAME as NAME3_28_0_, d
brole0_.HOST_ID as HOST4_28_0_, dbrole0_.ROLE_TYPE as ROLE5_28_0_, dbrole0_.CONFIGURED_STATUS as CONFIGUR6_28_0_, dbrole0_.MERGED_KEYTAB
 as MERGED7_28_0_, dbrole0_.MAINTENANCE_COUNT as MAINTENA8_28_0_, dbrole0_.DECOMMISSION_COUNT as DECOMMIS9_28_0_, dbrole0_.HAS_EVER_STAR
TED as HAS10_28_0_, dbrole0_.CONFIG_STALENESS_STATUS as CONFIG11_28_0_, dbrole0_.SERVICE_ID as SERVICE12_28_0_, dbrole0_.ROLE_CONFIG_GRO
UP_ID as ROLE13_28_0_ from ROLES dbrole0_ where dbrole0_.ROLE_ID in ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, c.relname AS TABLE_NAME,  CASE n.nspname ~ '^pg_' OR n.nspname = 'inform
ation_schema'  WHEN true THEN CASE  WHEN n.nspname = 'pg_catalog' OR n.nspname = 'information_schema' THEN CASE c.relkind   WHEN 'r' THE
N 'SYSTEM TABLE'   WHEN 'v' THEN 'SYSTEM VIEW'   WHEN 'i' THEN 'SYSTEM INDEX'   ELSE NULL   END  WHEN n.nspname = 'pg_toast' THEN CASE c
.relkind   WHEN 'r' THEN 'SYSTEM TOAST TABLE'   WHEN 'i' THEN 'SYSTEM TOAST INDEX'   ELSE NULL   END  ELSE CASE c.relkind   WHEN 'r' THE
N 'TEMPORARY TABLE'   WHEN 'i' THEN 'TEMPORARY INDEX'   WHEN 'S' THEN 'TEMPORARY SEQUENCE'   WHEN 'v' THEN 'TEMPORARY VIEW'   ELSE NULL
  END  END  WHEN false THEN CASE c.relkind  WHEN 'r' THEN 'TABLE'  WHEN 'i' THEN 'INDEX'  WHEN 'S' THEN 'SEQUENCE'  WHEN 'v' THEN 'VIEW'
  WHEN 'c' THEN 'TYPE'  ELSE NULL  END  ELSE NULL  END  AS TABLE_TYPE, d.description AS REMARKS  FROM pg_catalog.pg_namespace n, pg_cata
log.pg_class c  LEFT JOIN pg_catalog.pg_description d ON (c.oid = d.objoid AND d.objsubid = 0)  LEFT JOIN pg_catalog.pg_class dc ON (d.c
lassoid=dc.oid AND dc.relname='pg_class')  LEFT JOIN pg_catalog.pg_namespace dn ON (dn.oid=dc.relnamespace AND dn.nspname='pg_catalog')
 WHERE c.relnamespace = n.oid  AND c.relname LIKE 'PROBABLYNOT'  AND (false  OR ( c.relkind = 'r' AND n.nspname !~ '^pg_' AND n.nspname
<> 'information_schema' ) )  ORDER BY TABLE_TYPE,TABLE_SCHEM,TABLE_NAME

 

avatar
Explorer

I've just checked the versions and yes they are different so no wonder it fails! Opsworks normally automatically installs the latest version but I'd overidden it for the upgrade testing. Thanks, John.