Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What if a node fails - recovery of that node

Highlighted

What if a node fails - recovery of that node

Contributor

RedHat Linux 6.7

CDH 5.8.2 from parcels with Cloudera Manager 5.8.2

 

We have built a 3 node test cluster, including high availability for NameNode, HiveMetastore and HiveServer as follows on the "hosts > Roles" page:

 

Cluster 1 - CDH 5

Hosts Count Roles
vm-54 1 M RS B DN JN G ICS ID ISS AP ES HM SM JHS NM RM S
vm-[55-56] 2 RS DN FC JN NN G HMS HS2 ID NM S

This table is grouped by hosts having the same roles assigned to them.

 

vm-55 and vm-56 are essentialy identical, with cloudera manager running on vm-54.

 

We wanted to know what happens if a node becomes corrupted and cannot be recovered.

 

We powered down vm-55, cloned this to a new vm-57, and deleted all data directories for hadoop, zookeeper, etc. on vm-57.

 

We started vm-57, and this node took on all roles from vm-55 right away.  We were expecting to have to manually add roles to the new vm-57 but it had already taken on the roles magically.

 

Was this because we missed some data files, or default behavior for cloudera manager when a new agent is added to the cluster with a missing server?

 

Where can we find documentation on how to deal with a replacement of a corrupted critical server in a HA configuration. (such as running FC, JN, NN, S, etc.)

 

thank you.

Don't have an account?
Coming from Hortonworks? Activate your account here