Member since
07-09-2015
5
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7909 | 10-26-2015 01:14 AM |
10-26-2015
01:14 AM
1 Kudo
Finally, I get it done, so ,I post my steps ,maybe it will be helpful for someone like me who happen to have the same problem. I use cdh5.3.1,the main steps is : At first ,I recreate a new cdh manager, and reconfigure all parameters and roles in this new cdh manager, t hen add all process_id in processes table in scm db, and then modify /etc/cloudera-scm-agent/config.ini server_host to this new manager and restart all agent . At first, we should backup ,prepare for the worst. 1 cdh provided two ways to backup,backup database or backup config to a json file 1.1 backup database : http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-2-x/topics/cm_ag_backup_dbs.html for example: backup : pg_dump -h localhost -p 7432 -U scm -W -F c -b -v -f "scm_db.db" scm restore: pg_restore -p 7432 -U scm -W -d scm -v scm_db.db 1.2 write config to a json file http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cm_intro_api.html#xd_583c10bfdbd326ba--7f25092b-13fba2465e5--7f20 for example: export: curl -u admin:admin "http://localhost:8888/api/v9/cm/deployment" > ~/cmf_config.json import: curl --upload-file ~/cmf_config.json -u admin:admin http://localhost:8888/api/v9/cm/deployment?deleteCurrentDeployment=true If we did not backup and lost our database,well,it is hard to restore, however ,it can be done. The following is my steps: 1 Reinstall a new cdh manager on another machine with different host name. 2 Export a json configuration file from another currently working cdh manager with same cdh manager version. This cdh manager should include all service(such as hdfs,yarn,hbase,hdfs HA ,etc). 3 Run a script on all machine to get hostid and hostname, the hostid is in a file: /var/lib/cloudera-scm-agent/uuid 4 Modify the json file from step 2,do the following: 4.1 Delele all hosts in this json file and add all hosts's hostid and hostname from step 3. 4.2 Delete all roles in clusters's services. 4.3 Modify cluster name to old cluster id. you can get the old cluster id from hdfs namenode http webpage if you do not remember you old cluster id. 5 Import the new json file into the new create cdh manager. 6 Reconfigure all service's parameter and add all instance to service as before.(you should not use host template cause the agent is not report this new cmf server,however, you can add instances from service ). 7 If you have hdfs HA enabled, you have to export json file and add hdfs ha roles in that file, and import to it again. 8 If you do not want to stop all the services, you have to do the following steps to get all process_id from all hosts, however ,if you can stop services, you can jump to step 11; 9 Run the following script on all hosts to get process_id and services name. grep "spawned:.*with pid" /var/log/cloudera-scm-agent/supervisord.log |awk -vhost=$HOSTNAME '{ idx=index($5,"-"); name=substr($5,idx+1,length($5)-idx-1);pid=substr($5,2,idx-2);cc[name]=pid;}END{a=host;for (b in cc){ a=a"\t"b"\t"cc[b]} print a}' 10 Parese that file and insert a record in scm database' processes table , before we insert to processes table, you have to insert another record in commands table to get a new command_id .( this step is hard). 11 Modify all /etc/cloudera-scm-agent/config.ini server_host to new cdh manager , kill cmf listener and restart all cmf agent. Everything should be fine , the agent will report to new cdh server ,and the cdh server will return the same process_id to agent, so the running process will not be killed.
... View more