Reply
New Contributor
Posts: 2
Registered: ‎06-14-2017

Zookeeper ID gets unset in ClouderaManager

Hi, we have a Hadoop cluster in AWS managed by ClouderaManager, our cluster consist of about "static" 50 nodes running the bllow services :

 

HDFS

* NameNode 

* DataNode

* FailOver Controller

* Gateway

* HttpFS

* JounalNode

 

 

HBASE

* Thrift Server

* Master

* RegionServer

 

Yarn

* JobHistory Server

* NodeManager

* ReousrceManager

 

We recently start using Spot nodes to deploy more Yarn NodeManager nodes and scale in or out based on the running jobs demanda and capacity. When Spot instances (Yarn NodeManager) are getting terminated or new nodes are added to the cluster we are seeing an weird issue where the zookeeper services goes down affecting the whole cluster. Basically the whole cluster gets unhealthy and ClouderaManager indicates that the Zookeeper ServerId is unset. We have checked the Zookeeper logs and they do not contain any errors or indicate any malfunction, the problem seems to be related with CloudreaManager unsetting the zoopkeeper ids.

 

When this happens the steps we take to it is to set the Zookeeper id manually on ClouderaManager for each node , the ID config boxes are empty and we just set the id to 1,2,3 according to the node number..

 

Another thing we noticed is that Cloudera Manager server is spiking in cpu utilization when the spot nodes are being added or removed.

 

Version: Cloudera Express 5.7.2 (#17 built by jenkins on 20160722-1347 git: 1ac5976e8ad8f16506c2db236aee83141915c44d)

Java VM Name: Java HotSpot(TM) 64-Bit Server VM

Java VM Vendor: Oracle Corporation

Java Version: 1.8.0_101

 

Current parcel version we have deployed in the cluster is -> 5.7.2-1.cdh5.7.2.p0.18 .

 

This is causing serious issues in our production cluster and we are considering moving to EMR , if anyone has a simillar experience or could to point where to look in order to troubleshoot and fix this I would be very grateful.

 

Thanks in advance.

 

Posts: 642
Topics: 3
Kudos: 118
Solutions: 67
Registered: ‎08-16-2016

Re: Zookeeper ID gets unset in ClouderaManager

If it is CM, you should be able to see the changes made and potentially who is making them by viewing the configuration history. It is possible that you have found a bug.

"Another thing we noticed is that Cloudera Manager server is spiking in cpu utilization when the spot nodes are being added or removed."

This is because CM is doing quite a bit when adding and removing nodes and services. If you need to scale CM you could look at offloading the CMS services (specifically amon and rmon as they use an external DB) to another host.
New Contributor
Posts: 2
Registered: ‎06-14-2017

Re: Zookeeper ID gets unset in ClouderaManager

Thanks for the answer 

 

 

 

If you need to scale CM you could look at offloading the CMS services (specifically amon and rmon as they use an external DB) to another host.

 

I couldn't find many information about offloading CMS services (and I believe you meant *rman* instead of rmon) but our Cloudera Manager is running on an ec2 instance and the DB is hosted separately in an RDS instance (PostgreSQL 9.4.7 / db.m4.large) if thats what you mean. If that's not the case , can you provide some links ?

Highlighted
Posts: 642
Topics: 3
Kudos: 118
Solutions: 67
Registered: ‎08-16-2016

Re: Zookeeper ID gets unset in ClouderaManager

In the configuration screen for ZK, on the right hand side, there is the 'History and Rollback' screen.

Those services that are using RDS can be easily moved to another EC2 instance, that is what I meant.
Announcements