Support Questions

JohncUK · ‎09-30-2014

I've tried both restoring from scm postgresql database backup and from a file backup of config and postgres (database stopped for backup). Both fail complaining about missing column

Caused by: org.postgresql.util.PSQLException: ERROR: column dbrole0_.config_staleness_status does not exist

Has anyone successfully restored CM on a new server?

Thanks, John.

smark · ‎09-30-2014

It sounds most likely something's just out of sequence. I could see this working fine (and have performed it numerous times) as long as sequence is maintained.

So,

Install CM on node1
Upgrade CM on node1
Run CM through all upgrade wizards which upgrades all hosts in cluster1 hearbeating to node1 CM Server.
Stop all agents
Stop CM Server
Take [pg_dump|mysqldump] of CM database
Copy [pg_dump|mysqldump] of CM database to node2
Reimport consistent database export into same version of RDBMS on node2. Ensure GRANT statements also apply.
scp /etc/cloudera-scm-server/db.properties from node1 to node2 (assuming you've set appropriate GRANTs to the db import on node2 using same name / pw but allowing new hostname node2) OR
alter/etc/cloudera-scm-server/db.properties on node2 to ensure correct user/pw/hostname are specified depending on how you've issued GRANTs / permissions.
service cloudera-scm-server start && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
Sanity-check that the server process comes up; once port :7180 is bound, attempt login on node2:7180.
With UI up, configuration should all be in place for all clusters/services/roles, but no agents hearbeating
Alter /etc/cloudera-scm-agent/config.ini on all managed hosts. Change the "server_host=node1" to "server_host=node2", save and apply across all nodes.
Restart all agents: # for i in $(cat file_listing_all_managed_cluster_nodes); do ssh root@$i 'service cloudera-scm-agent restart'; done
At this time all agents should be heartbeating to node2 which contains an exact clone of the configuration from node1.

Given a consistent backup from this mythical node1, I don't see how a column can go missing unless some specific [pg_dump|mysqldump[ export options were applied, or some sort of import exclusion happened. What syntax did you use both on the export and the import side of things for the RDBMS? Also, which RDBMS are you using specifically?

Thanks,

--

Mark Schnegelberger

View solution in original post

smark · ‎09-30-2014

Are you restoring to the exact same version of Cloudera Manager on the new server, as the version from which the backup was taken?

JohncUK · ‎09-30-2014

Yes, I am testing the backup procedure before deploying in live. Repeated numerous times and get the same error. I upgraded from 5.0.0 to 5.1.3 first then backed up the database.

smark · ‎09-30-2014

It sounds most likely something's just out of sequence. I could see this working fine (and have performed it numerous times) as long as sequence is maintained.

So,

Install CM on node1
Upgrade CM on node1
Run CM through all upgrade wizards which upgrades all hosts in cluster1 hearbeating to node1 CM Server.
Stop all agents
Stop CM Server
Take [pg_dump|mysqldump] of CM database
Copy [pg_dump|mysqldump] of CM database to node2
Reimport consistent database export into same version of RDBMS on node2. Ensure GRANT statements also apply.
scp /etc/cloudera-scm-server/db.properties from node1 to node2 (assuming you've set appropriate GRANTs to the db import on node2 using same name / pw but allowing new hostname node2) OR
alter/etc/cloudera-scm-server/db.properties on node2 to ensure correct user/pw/hostname are specified depending on how you've issued GRANTs / permissions.
service cloudera-scm-server start && tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
Sanity-check that the server process comes up; once port :7180 is bound, attempt login on node2:7180.
With UI up, configuration should all be in place for all clusters/services/roles, but no agents hearbeating
Alter /etc/cloudera-scm-agent/config.ini on all managed hosts. Change the "server_host=node1" to "server_host=node2", save and apply across all nodes.
Restart all agents: # for i in $(cat file_listing_all_managed_cluster_nodes); do ssh root@$i 'service cloudera-scm-agent restart'; done
At this time all agents should be heartbeating to node2 which contains an exact clone of the configuration from node1.

Given a consistent backup from this mythical node1, I don't see how a column can go missing unless some specific [pg_dump|mysqldump[ export options were applied, or some sort of import exclusion happened. What syntax did you use both on the export and the import side of things for the RDBMS? Also, which RDBMS are you using specifically?

Thanks,

--

Mark Schnegelberger

JohncUK · ‎09-30-2014

I didn't stop all agents just CM. I intially tried just restoring scm, then amon,nav,rman and csm but failed with same problem. Local postgresql database. It is odd a file restore also fails so all config and database will be consistent.

On cm1, tried with --clean and without (just will drop tables if exist)

export PGPASSWORD=passwordfromfile
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_amon.sql amon
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_nav.sql nav
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_rman.sql rman
pg_dump -h localhost -p 7432 -U cloudera-scm --clean --file=csm_dump_scm.sql scm

On cm2 built automatically in Amazon opsworks so has a blank CM config.

sudo service cloudera-scm-server stop

dropdb -h localhost -p 7432 -U cloudera-scm 'scm'
createdb -h localhost -p 7432 -U cloudera-scm 'scm'

export PGPASSWORD=newpasswordfromfile
psql -h localhost -p 7432 -U cloudera-scm -d amon -f csm_dump_amon.sql
psql -h localhost -p 7432 -U cloudera-scm -d nav -f csm_dump_nav.sql
psql -h localhost -p 7432 -U cloudera-scm -d rman -f csm_dump_rman.sql
psql -h localhost -p 7432 -U cloudera-scm -d scm -f csm_dump_scm.sql

sudo service cloudera-scm-server start

Java runs for a minute then dies. /var/log/cloudera-scm-server/db.log

ERROR: column dbrole0_.config_staleness_status does not exist at character 417
STATEMENT: select dbrole0_.ROLE_ID as ROLE1_28_0_, dbrole0_.OPTIMISTIC_LOCK_VERSION as OPTIMIST2_28_0_, dbrole0_.NAME as NAME3_28_0_, d
brole0_.HOST_ID as HOST4_28_0_, dbrole0_.ROLE_TYPE as ROLE5_28_0_, dbrole0_.CONFIGURED_STATUS as CONFIGUR6_28_0_, dbrole0_.MERGED_KEYTAB
as MERGED7_28_0_, dbrole0_.MAINTENANCE_COUNT as MAINTENA8_28_0_, dbrole0_.DECOMMISSION_COUNT as DECOMMIS9_28_0_, dbrole0_.HAS_EVER_STAR
TED as HAS10_28_0_, dbrole0_.CONFIG_STALENESS_STATUS as CONFIG11_28_0_, dbrole0_.SERVICE_ID as SERVICE12_28_0_, dbrole0_.ROLE_CONFIG_GRO
UP_ID as ROLE13_28_0_ from ROLES dbrole0_ where dbrole0_.ROLE_ID in ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15)
ERROR: current transaction is aborted, commands ignored until end of transaction block
STATEMENT: SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, c.relname AS TABLE_NAME, CASE n.nspname ~ '^pg_' OR n.nspname = 'inform
ation_schema' WHEN true THEN CASE WHEN n.nspname = 'pg_catalog' OR n.nspname = 'information_schema' THEN CASE c.relkind   WHEN 'r' THE
N 'SYSTEM TABLE'   WHEN 'v' THEN 'SYSTEM VIEW'   WHEN 'i' THEN 'SYSTEM INDEX'   ELSE NULL   END WHEN n.nspname = 'pg_toast' THEN CASE c
.relkind   WHEN 'r' THEN 'SYSTEM TOAST TABLE'   WHEN 'i' THEN 'SYSTEM TOAST INDEX'   ELSE NULL   END ELSE CASE c.relkind   WHEN 'r' THE
N 'TEMPORARY TABLE'   WHEN 'i' THEN 'TEMPORARY INDEX'   WHEN 'S' THEN 'TEMPORARY SEQUENCE'   WHEN 'v' THEN 'TEMPORARY VIEW'   ELSE NULL
END END WHEN false THEN CASE c.relkind WHEN 'r' THEN 'TABLE' WHEN 'i' THEN 'INDEX' WHEN 'S' THEN 'SEQUENCE' WHEN 'v' THEN 'VIEW'
WHEN 'c' THEN 'TYPE' ELSE NULL END ELSE NULL END AS TABLE_TYPE, d.description AS REMARKS FROM pg_catalog.pg_namespace n, pg_cata
log.pg_class c LEFT JOIN pg_catalog.pg_description d ON (c.oid = d.objoid AND d.objsubid = 0) LEFT JOIN pg_catalog.pg_class dc ON (d.c
lassoid=dc.oid AND dc.relname='pg_class') LEFT JOIN pg_catalog.pg_namespace dn ON (dn.oid=dc.relnamespace AND dn.nspname='pg_catalog')
WHERE c.relnamespace = n.oid AND c.relname LIKE 'PROBABLYNOT' AND (false OR ( c.relkind = 'r' AND n.nspname !~ '^pg_' AND n.nspname
<> 'information_schema' ) ) ORDER BY TABLE_TYPE,TABLE_SCHEM,TABLE_NAME

JohncUK · ‎10-01-2014

I've just checked the versions and yes they are different so no wonder it fails! Opsworks normally automatically installs the latest version but I'd overidden it for the upgrade testing. Thanks, John.

Cloudera Community

Support Questions

CDH5.1.3 Cloudera Manager restore on new server from files or postgresql database