Apache ZooKeeper is a core infrastructure component in the Apache Hadoop stack and is widely used by many companies for service discovery, configuration management, etc. In a previous blog post, we described improving ZooKeeper security by enabling SASL Quorum Peer Mutual Authentication and Authorization in Cloudera Distribution of Hadoop (CDH). That was back in 2017 and since then the open source community has released Apache ZooKeeper 3.8.0 with important security improvements and new features. It’s the latest stable version included in the Cloudera Data Platform today.
In this blog post, we’ll focus on wire encryption, specifically how to enable it and how Cloudera Manager helps secure all communication channels between ZooKeeper clients and servers. The previous blog post covered authentication and authorization of quorum members which is a huge improvement in terms of security, but to close gaps entirely and avoid eavesdropping on your important data, encryption is just as important. More formally the security requirement for encryption comes in the following aspects of the ZooKeeper protocol:
It’s assumed (check the previous blog post) that technologies and concepts like SASL, Kerberos, quorum, and leader election are already known to the reader. Recently added security features, Quorum, and Client TLS will be covered in the following sections.
First, let’s talk about encryption in general. Nowadays the global standard of encryption of electronic communication channels is TLS. Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network. TLS 1.0 was first defined in RFC 2246 in January 1999 as an upgrade of SSL Version 3.0 and SSL 3.0 was deprecated in June 2015 by RFC 7568. Today only TLS versions 1.2 and 1.3 are considered secure, ZooKeeper version 3.8.0 supports them both.
TLS encryption is based on PKI (Public Key Infrastructure). The concept is based on two local stores on both the server and client side. The keystore has the key that the server will present to the connecting party as a certificate of authority. To be more precise: public and private key pairs of the server are stored in the keystore. The digitally signed public key is often called the “certificate” that is sent over to the client as the I.D. It will also be used for the initial encryption and transmission of the symmetric key that the parties will use for the rest of the communication. That’s called the key exchange phase and it’s needed because symmetric key encryption algorithms are much faster than asymmetric algorithms.
On the flip side, the client needs to verify the authenticity of the server and the validity of the presented certificate. Hence, it needs the truststore which contains a list of certificates that can be trusted. Again, more accurately: the public keys that the signature of the certificate can be verified. If the server presents an untrustworthy certificate, the client will abort the connection immediately.
It’s advisable to add password protection to these stores and limit the permissions in the local filesystem to avoid compromising the content.
For generic TLS connections (like opening a page in a web browser), the client doesn’t need a certificate, so it doesn’t have a keystore set up, only the server’s authenticity is verified in the protocol. However, if we set up keystores for all ZooKeeper peers, mutual authentication will be performed in the communication called mTLS (mutual TLS) and is enabled by default.
Wire encryption of quorum communication has been introduced in ZOOKEEPER-236 and released in Apache ZooKeeper 3.5.5. The solution covers the entire communication of quorum members including leader election and the Atomic Broadcast Protocol (ZAB).
Based on ZooKeeper’s documentation, enabling Quorum TLS is about adding the following properties to zoo.cfg file:
sslQuorum=true
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.quorum.keyStore.location=/path/to/keystore.jks
ssl.quorum.keyStore.password=password
ssl.quorum.trustStore.location=/path/to/truststore.jks
ssl.quorum.trustStore.password=password
:exclamation_mark:Note that TLS and encryption in general is only available in ZooKeeper with the Netty communication framework. Hence, both server and client have to be migrated together to use Netty for communication.
About creating keystores and truststores please refer to the ZooKeeper documentation for instructions. After enabling the feature we can see the following in the startup logs of each quorum member:
INFO [main:QuorumPeer@1789] - Using TLS encrypted quorum communication
INFO [main:QuorumPeer@1797] - Port unification disabled
...
INFO [QuorumPeerListener:QuorumCnxManager$Listener@877] - Creating TLS-only quorum server socket
It says that ZooKeeper will use TLS-encrypted channels for quorum communication and port unification is disabled. Port Unification is a feature that plays a key role in the Zero-Downtime upgrade of a ZooKeeper ensemble when enabling Quorum TLS (See ZooKeeper documenation for details). It’s needed for a middle step of the upgrade process when both TLS and non-TLS connections are accepted. The above log snippet reflects the final phase when the doors are already closed, and non-TLS connections are automatically rejected.
Let’s check another log snippet from the startup during the leader election:
INFO [QuorumConnectionThread-[myid=1]-1:o.a.z.s.q.QuorumCnxManager@388] -
SSL handshake complete with andor-5560-ubuntu/192.168.1.216:4182 - TLSv1.2 -
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
INFO [ListenerHandler-andor-5560-ubuntu/192.168.1.216:4181:o.a.z.s.q.
UnifiedServerSocket$UnifiedSocket@266] - Accepted TLS connection from
andor-5560-ubuntu/192.168.1.216:34444 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
The two lines refer to an outbound and an inbound connection of leader election. They confirmed that a secure, TLS-encrypted communication channel has been established between quorum members. The protocol version (TLSv1.2) and the chosen cipher suite are also confirmed.
After the successful leader election feedback of a quorum connection is a bit harder to spot because it’s very similar to what we’ve seen previously:
INFO [LearnerHandler-/192.168.1.216:56392:o.a.z.s.q.UnifiedServerSocket$UnifiedSocket@266] -
Accepted TLS connection from andor-5560-ubuntu/192.168.1.216:56392 - TLSv1.2 -
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
INFO [LearnerHandler-/192.168.1.216:56392:o.a.z.s.q.LearnerHandler@511] -
Follower sid: 2 : info : andor-5560-ubuntu:3182:4182:participant
Follower sid (server id) 2 has connected with TLS protocol version 1.2 as a voting participant.
Enabling encryption of client-server communication in ZooKeeper is very similar to Quorum TLS. It was added in ZOOKEEPER-2125 “SSL on Netty client-server communication”. As the name suggests this feature also requires the Netty communication framework to be enabled on both the client and server side.
Server-side configuration in zoo.cfg file:
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.keyStore.location=/path/to/keystore.jks
ssl.keyStore.password=password
ssl.trustStore.location=/path/to/truststore.jks
ssl.trustStore.password=password
secureClientPort=2182
In the first line, we enabled the Netty communication framework for the server, and in the rest of the snippet we set up the key- and truststore for the server. Why do we need a truststore for the server? We’ll talk about it in a bit.
A secure client port is also set to distinguish from non-secure communication.
Client-side configuration as Java system properties:
export CLIENT_JVMFLAGS="
-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
-Dzookeeper.client.secure=true
-Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
-Dzookeeper.ssl.keyStore.password=testpass
-Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
-Dzookeeper.ssl.trustStore.password=testpass"
We set Java system properties in the command line, however, there’s a relatively recent (2 years old) feature in the ZooKeeper client (ZOOKEEPER-3689) that allows us to set specific parameters (including TLS settings) in a client configuration file.
The Netty framework is important to enable it in the client too, we also need to explicitly turn on the secure mode (client.secure = true). Keystore and truststore settings are similar to the server side.
Make sure you connect the client to the secure port:
zkCli.sh -server andor-5560-ubuntu:2182
In the log output of the client, you can verify that the connection is secured by TLS:
[zk: andor-5560-ubuntu:2182(CONNECTING) 0] 2023-09-21 14:47:06,828 [myid:] - INFO
[epollEventLoopGroup-2-1:o.a.z.ClientCnxnSocketNetty$ZKClientPipelineFactory@454] -
SSL handler added for channel: [id: 0xf5c2cd48]
2023-09-21 14:47:06,840 [myid:] - INFO
[epollEventLoopGroup-2-1:o.a.z.ClientCnxn$SendThread@996] -
Socket connection established, initiating session, client:
/192.168.1.216:50036, server: andor-5560-ubuntu/192.168.1.216:2182
2023-09-21 14:47:06,841 [myid:] - INFO
[epollEventLoopGroup-2-1:o.a.z.ClientCnxnSocketNetty$1@183] -
channel is connected: [id: 0xf5c2cd48, L:/192.168.1.216:50036 -
R:andor-5560-ubuntu/192.168.1.216:2182]
SSL handler has been added to the (Netty) channel and the channel is connected which means all verification and the handshake process has been completed successfully.
:exclamation_mark:Notice that we use the fully qualified domain name (FQDN) when connecting to a secure cluster.
This is because of how certificate verification works in the TLS protocol; although it’s possible to add IP addresses to the certificate, it’s usually not the case, because a certificate revocation and renewal process would be needed if the IP address of the server changes. The certificates in most cases are issued for the FQDN. When ZooKeeper comes to verify the certificate that’s provided by the server, it will use the information that we provided in the connection string “andor-5560-ubuntu:2182” and compare it to what’s included in the server’s cert. If the hostname matches the one in the cert, hostname verification is passed. This verification procedure can be disabled via ssl.hostnameVerification and ssl.quorum.hostnameVerification system properties in the Quorum and Client TLS protocol, though it’s only recommended for testing purposes.
We briefly mentioned that setting up a keystore for the client and a truststore for the server is not strictly necessary. The TLS protocol doesn’t require the client to provide a certificate (keystore), hence the server doesn’t verify it (truststore), similar to how web browsers interact with web servers over HTTPS. Of course, ZooKeeper has settings to manipulate this feature in both quorum and client protocols: ssl.clientAuth and ssl.quorum.clientAuth which can have 3 values: none, want, and need. I find it’s easier to understand if we translate this to “none”, “optional” and “required”. “None” means the feature is turned off, “want” means the server asks for a client certificate, but it’s completely okay to ignore it, and “need” means the client must provide it, otherwise the connection will be dropped.
It may surprise you to find that ZooKeeper’s default setting for both protocols is needed.
Why? Because it’s more secure. If you think of the quorum protocol, all nodes act as both servers and clients so both keystore and truststore are set up already. Why not enable the additional check? For the client protocol, I think it’s just enabled for convenience, handling this configuration has a common code path. Feel free to disable it to make your client’s life easier.
The foundation of TLS encryption is Private Key Infrastructure (PKI). It’s a nicely organized infrastructure of private, and public keys, certificates, and the corresponding well-protected store files. However, the truth is that it’s a maintenance nightmare. Not just the issuance of each keypair for all participants, but moreover key expiry. Public certificates should be issued for a short term: the shorter the expiry is, the less likely that attackers can steal them and compromise the system. At the same time, it is harder to maintain key renewal.
The good news is that Cloudera Manager does the hard work for us. AutoTLS is a mature, battle-tested feature of CM that can issue, manage, and renew self-signed certificates in both public and private cloud deployments. Quorum and Client TLS features are both smoothly integrated with AutoTLS and come for free when the cluster is upgraded. If AutoTLS is already enabled, ZooKeeper will be automatically upgraded to take advantage of existing certificates to secure quorum and client communication.
In recent versions of the Cloudera (previously Cloudera Data Platform), both of the mentioned features are turned on automatically which we can verify in Cloudera Manager.
Most Hadoop components are dependent on ZooKeeper. All of these are included, prepared, and automatically set up in the Cloudera distribution for a secure ZooKeeper connection. Unless a legacy application has to be supported outside of the cluster, the non-secure communication port can be freely disabled, the cluster will keep operating normally. Doing so will close the door for plain text communication.
Fast and reliable TLS encryption of communication is a huge leap forward in hardening ZooKeeper security. However, we still have some gaps to fill. Enforcing authentication of connecting clients is a crucial part of a secure system and was a long outstanding debt of the ZooKeeper community. ZOOKEEPER-1634 and ZOOKEEPER-3561 delivered this feature and are integrated into the Cloudera as part of upgrading to the latest stable Apache ZooKeeper 3.8.0. We’ll talk about it in detail in the upcoming part of ZooKeeper security hardening blog. Stay tuned.