Support Questions

Find answers, ask questions, and share your expertise

Converting zookeeper nodes to Unmanaged nodes

avatar

We would like to convert  3   nodes ( these nodes have only zookeeper roles deployed on them ) to  unmanaged nodes by purging the cloudera software . These 3 nodes forming a zookeeper ensemble were created long time ago and are not used by any of our existing CDP  cluster services but some external edge querying applications . we now need to take them off being managed by cloudera manager to avoid being included unnecessarily in the licensing cost .  How should I go about completing this task ?  I don't want to break the external application but there is no reason that we should have these nodes managed by CM . Do I have to rebuild the ensemble using the open source s/w from apache after purging the cm agent s/w ?  

 

 

 

9 REPLIES 9

avatar
Community Manager

@bkandalkar88, Welcome to our community! To help you get the best possible answer, I have tagged our experts @Scharan @willx @SVB  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar

@Vidya  I am still waiting for someone  to  respond to my query . 

avatar
Expert Contributor

Hello @bkandalkar88 

Let me answer your queries one by one.

1)As I read you are looking for guidance in removing the CM managed 3 ZK nodes (not used for CM managed cluster services) to unmanaged nodes running ZK service for your external edge querying applications.

- Assuming you have two ZK Services managed by CM (Cloudera Manager) and you are trying to stop the additional/second ZK Service that is only used by some edge query applications.

I am not sure how the ZK service is being used by the edge application and what's being stored there. Assuming you understand the potential risks and consequences while you think of removing the ZK Service from CM managed cluster to the External ZK service.

While you can decommission the nodes from CM Managed cluster after stopping the ZK Service, I would recommend consulting with Cloudera support or experienced professionals before attempting such actions.

The steps for host decommissioning and removal from the cluster are described in our document here.
- https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/managing-clusters/topics/cm-decommission-host... 
- https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/managing-clusters/topics/cm-removing-host-fro... 

2) Once the nodes are removed from the cluster, you can refer to the ZK getting started guide [1] to install the ZK Service on them which can be used by your Edge applications (you need to make changes to applications to connect the external ZK service.)

[1] ZooKeeper Getting Started Guide : https://zookeeper.apache.org/doc/r3.9.1/zookeeperStarted.html 

 

avatar

Thanks for your reply , PabitraDas

Let me add some more clarity  here ... We have a CM that manages multiple CDH & CDP cluster .  one of the cluster out of those is a standalone 5 node zookeeper cluster , that is not used by  any services of any of the clusters managed by this CM . This standalone 5 node  zookeeper cluster was created long time back using cloudera manager by an admin who is no longer working with us and I am assuming he may have created this 5 node zookeeper ensemble using CM just because it was easy to do that using CM instead of a CLI / manually 

We now want to decommission this 5 node zookeeper cluster completely and remove it from cloudera manager however we do not want to loose the zookeeper data . so the idea is to one by one decommission these 5 nodes and remove them from cloudera manager however we want zookeeper service to be re-installed using CLI on this  nodes since the ensemble will still be used by an external app . so the question here is , after we decomm 1 zookeeper node (starting with a current follower  first )  from the zookeeper ensemble of 5  node , is it going to be an issue that we will have temporarily 4 nodes still managed by cloudera manager and one not managed (since the ID of zookeeper in myid file will be same as before and we are hoping that after the unmanaged zk node comes backup , it will sync the data from the leader zk instance )  . We plan to keep this setup  ( 4 managed and 1 unmanaged zk node of same ZK ensemble  )  running for couple of days and if no issues are reported then proceed with decommissioning of the remaining 4 nodes one after the another . 

Once again thanks for taking out time to respond to my question , I genuinely appreciate it 


standalone 5 node zk cluster .jpg

avatar
Expert Contributor

Hello @bkandalkar88 

Thank you for clarification and additional details about your plan.
Technically, the idea/plan you proposed here may work. However, this needs to be tested.

Since the CM managed ZK ensemble is not being used by other service and only one application, you can follow the plan as below.

1) Before decommission of ZK nodes, capture the ZK config file from ZK process directory for reference. You can tar the files in "/var/run/cloudera-scm-agent/process/**-ZOOKEEPER-***/" and keep it safe along with the backup of the ZK data directory files (myid and version-2). Update /etc/hosts file on each ZK node adding all 5 hostnames of ZK ensemble, so that each host can resolve the peer ZK hostnames without DNS.

2) Decommission and remove 2 Follower ZK nodes from cluster. ZK ensemble needs 3 or 5 nodes to maintain the quorum. You can't remove 1 node while keeping 4 nodes in ZK ensemble in a CM managed cluster.

Remaining 3 ZK nodes (1 Leader and 2 Followers) would continue to run and serve the client requests (applications).

3)Once the ZK nodes are removed from CM managed cluster, stop the agent on ZK nodes to disable CM control on running ZK nodes. Verify ZK Service health from command Line and see if all client requests are working fine.

4) Verify the zoo.cfg file and data directory contents on all the 5 nodes are intact. If the ZK config is updated during decommission process, then replace them with old zoo.cfg file from backup (suggested in Step-1)

5) You need to start the ZK Service without CM on the removed nodes (followers) from CLI using script bin/zkServer.sh. Refer ZK comamnds to start service here - https://gist.github.com/miketheman/6057930
or
- https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html#sc_InstallingSingleMode

comamnd - bin/zkServer.sh start </path/zoo.cfg>

6) After the ZK Service is started, ensure the connection between the ZK nodes are working fine and quorum port (4181), Leader port (3181) and client ports are in listening state and working as expected.

avatar

@PabitraDas :- Here's the steps that I followed today to convert one of zk node into an unmanaged node ...I am running into an issue described below ..
1) Stopped the zk server from cloudera manager and put the host under maintenance

2) updated the hosts file on all  the 5 zk server  nodes mapping the IP address and host fqdn 

3) Backed up data and config files 

4) from CM , decommissioned one of the host 

5) stopped the cm agent on the node that was decommissioned

6) removed the host from CM 

7) zoo.cfg file got updated with default values in this process , so restored the file from backup that had the zk quorum as follows ...

server.1=zk1-fqdn:3181:4181
server.2=zk2-fqdn:3181:4181
server.3=zk3-fqdn:3181:4181
server.4=zk4-fqdn:3181:4181
server.5=zk5-fqdn:3181:4181

😎verified that myid file on all the host had unique id's so that there is no conflict and they can participate in the ensemble and the zk data was intact 

9) started the ZK on the decommissioned node from command line and as seen in the attached screen shot , the startup event shows that it is reading the configuration from the zoo.cfg that has the quorum address 

Though the zookeeper server gets started from command line however when I run the following command to get the "Mode" of this zk server , instead of  showing up as a  "follower" . it says "standalone" .

echo "stat" | nc zk3-fqdn 2181 | grep Mode

Mode: standalone

Please refer to the attached screen shot below that shows the nohup.out that shows zookeeper was successfully started and the corresponding zoo.cfg has the quorum details (similar to the one available on the other members of the quorum ) . 

It looks like , though the startup command reads the configuration from zoo.cfg file however it still fails to become the member of the ensemble and I am not sure why ? 


zookeeper fails to join the quorum .jpg

avatar

@PabitraDas  Any thoughts ? 

Btw, when I just stop the ZK server from cloudera manager and attempt to start it from command line like this ...though I see the message from script as "STARTED" but the "start" attempt  actually fails and I see nothing when I do a "ps -ef |grep -i zookeeper" . I see a "zookeeper.out" file was created on my current location and it shows the SASL error as seen below the attached screen shot 

bkandalkar88_0-1707974659982.png

bkandalkar88_1-1707974884469.png

It looks like the attempt to start zookeeper server is failing because of an SASL authentication failure as seen in the error snippet above .. This is how my jaas.conf looks like 

root@fqdn:/tmp/zk4mcli# cat jaas.conf
Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="zookeeper.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/FQDN@REALM";
};

QuorumServer {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="zookeeper.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/FQDN@REALM";
};

QuorumLearner {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="zookeeper.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/FQDN@REALM";

avatar
Expert Contributor

Hello @bkandalkar88 

Thank you for sharing the details. 

As I mentioned in my first note, these steps needs to be tested. However, reading your update, it seems the decom ZK server is not able to communicate with other ZK Servers in ensemble. The required authentication process seems broken. 

So, I would suggest to add back the host (recommission the node) to CM and then to cluster and add the ZK Server role on it. Let's get back to previous operational condition.

Then, file a support case with Cloudera Support to assist on the requirement if you have valid support entitlement with extended support for CDH (since CDH is EOL). Support may reproduce the issue in house and share the steps to you. 

 

avatar
Community Manager

@bkandalkar88, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: