Reply
Contributor
Posts: 66
Registered: ‎12-24-2015

replacement of ec2 machine

[ Edited ]

We are succesfully able to run cloudera cluster using director. Great tool indeed.

we want to prepare steps for the replacement of unhealthy ec2 machine or simply upgrade ec2 machine with more powerful machine.

 

What would be standard steps to do same in below cases without having fear of corrupting cluster state :

 

  •  replacing Cloudera manager ec2 machine
  •  replacing any master/worker ec2 machine.

 

Thanks !!

Regards,

Kartik

Cloudera Employee
Posts: 17
Registered: ‎07-11-2016

Re: replacement of ec2 machine

Hello,

 

To replace an EC2 instance with a more powerful instance, do either of the following:

 

1. Power down the EC2 instance.

2. In the EC2 management console, right click the instance and choose "Change Instance Type."

 

OR

 

1. Create an image (AMI) of the EC2 instance.

2. Launch an instance from the AMI.

 

OR

 

1. Create a snapshot of the EC2 instance's volume.

2. Create a volume from the snapshot.

3. Provision a larger EC2 instance.

4. Power down the EC2 instance.

5. Delete volume from the EC2 instance.

6. Attach the volume, provisioned from the snapshot, to the EC2 instance.

Contributor
Posts: 66
Registered: ‎12-24-2015

Re: replacement of ec2 machine

[ Edited ]

Thank you, those are standard options but in cloudera context with instance storage(i2/d2 ec2) cases most of these options will not work. As instance storage, we don't have EBS but instance storage. Second, we can't stop any machine in production cluster to increase power or for replacement.

 

Only below options looks good to me.

 

1. Create an image (AMI) of the EC2 instance.

2. Launch an instance from the AMI.

 

I would like to know, if we have new AMI based on some master/worker node and attach that machine to Clouder manager, will it able to work as replacement services ?

 

Cloudera manager is M4.xlarge machine based on EBS volume. I believe, this we will be able to restore based on standard EBS guide line. 

 

 

 

 

 

 

Cloudera Employee
Posts: 17
Registered: ‎07-11-2016

Re: replacement of ec2 machine

For the worknodes (e.g., datanodes), it would be better to add a larger EC2 instance to the cluster, replicate the data to the instance and add, (or transfer) any roles.  Once the smaller EC2 instance is no longer needed, simply decommision/remove it. 

 

Cloudera Employee
Posts: 43
Registered: ‎02-18-2014

Re: replacement of ec2 machine

Hello there,

 

There are three distinct replacement scenarios being contemplated here.

 

The easiest is replacing a worker, such as a node that only hosts, say, a node manager and datanode. Rufus's guidance is correct: you should add a new worker built from the larger instance type, with the AMI of your choice, and allow HDFS replication to work. Then it becomes safe to remove older, smaller workers. Note that it is best to do this through Director, using its grow/shrink capability, so that it remains in sync with Cloudera Manager and the true cluster state. You will need to define a new instance template for the larger worker, and define a new instance group with the roles matching the older workers. Documentation on the process is here:

 

https://www.cloudera.com/documentation/director/latest/topics/director_ui_cluster_shrink.html

 

Replacing a master is more difficult, and I don't believe we have a way to do that within Director. I could be wrong, though, so I will check on that. There is some support for replacing failed HDFS masters, but it is in the context of an HA (highly available) cluster.

 

Replacing a Cloudera Manager instance is possible, but it is complex. We currently reserve that procedure for our support personnel to execute.

 

Generally, if you find that your needs have grown to exceed your master nodes and CM instances, you should create new deployments and clusters and move the data out of the old clusters, using distcp or other appropriate tools.

Contributor
Posts: 66
Registered: ‎12-24-2015

Re: replacement of ec2 machine

Thanks Bill. Now i am very determine to keep backup of different service data using own tool capabilities.

Contributor
Posts: 66
Registered: ‎12-24-2015

Re: replacement of ec2 machine

Hi Bill,

 

I tried to do replacement of worker and I was not successful at all. I did below steps. Please advice what went wrong here.

 

  • I had 3 workers running datanodes & region server services.[successful]
  • Added one more workers via update director client command. [successful]
  • run hdfs reblance command after that. [successful]
  • out of 4 workers now, selected one workers from cloudera manager. [successful]
  • stopped roles and pressed decommission button from cloudera manager.[successful]
  • run update command by reduced workers count to 3.  [It was never completed and end up running into several hours.]

 

 

Regards,

Kartik

 

 

New Contributor
Posts: 4
Registered: ‎05-04-2017

Re: replacement of ec2 machine

@Bill Havanki

Can you please let me know the steps to be performed incase Cloudera Director instance is down or if we need to replace it.

@kartikbha : Did you try the master node replacement ?If yes can you please share the steps performed to replace ec2 instance of master/CM node .

 

Regards,

Tauqeer Khan

Contributor
Posts: 66
Registered: ‎12-24-2015

Re: replacement of ec2 machine

When migrating a NameNode, co-located Failover Controller should also be migrated. Literally, it is a combo.
Journal node migration can be done either separately or with NameNode.

 

Requirements:
This procedure requires cluster downtime. (Do not stop cluster, just see that no one works on it)

Do the following before you run the wizard:
• On hosts running active and standby NameNodes, back up the data directories.
• On hosts running JournalNodes, back up the JournalNode edits directory.
• If the source host is not functioning properly or is not reliably reachable, decommission the host.(Never do this as other roles get affected, if only NN, FC, JN are present on source host, only then can be chosen )
Running the Migrate Roles Wizard
• If the host to which you want to move the NameNode is not in the cluster, follow the instructions in Adding a Host to the Cluster to add the host.

• Go to the HDFS service.
• Click the Instances tab.
• Click the Migrate Roles button.
• Click the Source Host text field and specify the host running the roles to migrate. In the Search field optionally enter hostnames to filter the list of hosts and click Search.

• Click the Destination Host text field and specify the host to which the roles will be migrated. On destination hosts, indicate whether to delete data in the NameNode data directories and JournalNode edits directory. If you choose not to delete data and such role data exists, the Migrate Roles command will not complete successfully.

• Acknowledge that the migration process incurs service unavailability by selecting the Yes, I am ready to restart the cluster now checkbox.

• Click Continue. The Command Progress screen displays listing each step in the migration process.

• When the migration completes, click Finish.

Cloudera Employee
Posts: 43
Registered: ‎02-18-2014

Re: replacement of ec2 machine

Hello Tauqeer,

 

If Director is down, then your clusters will still function normally. If you are using pay-as-you-go licensing (a.k.a. usage based billing), then Director should be kept running so that it can gather billing data. Otherwise, you can leave Director shut off unless you want to use it.

 

If you need to replace your Director instance, you can install Director on a new instance and then provide it access to the old Director's database. For MySQL, that involves the usual configuration for Director. For H2, you must copy the state.h2.db file, usually in /var/lib/cloudera-director-server/state.h2.db, from the old Director instance to the new one. For either scenario, you can also restore the database from a backup if necessary.

 

If you made any custom configuration changes to the old Director instance, in the application.properties file, then you should also carry them over to the new instance.

 

Director does not have a mechanism to automatically learn about clusters that are already in existence, either built by a prior Director instance or manually. That's one reason it's important to have database backups.

Announcements