Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are the best practices around HDFS Transparent Encryption?

avatar
Expert Contributor

I am curious to know the best practices you can think of around the HDFS Transparent Encryption, such as key rotation, preventing impersonation attacks, implementing KMS ACLs, & etc. Thanks!

1 ACCEPTED SOLUTION

avatar
You bring up a good point about impersonation attacks. If a user is able to authenticate as (or impersonate) a user, they can gain access to all kinds of data, keys, etc., that they shouldn’t. This is why it is very important to use a reliable authentication mechanism (e.g. Kerberos), require users to change passwords regularly, and use secure passwords. That reduces the chances that impersonation attacks will occur. That being said, there are a couple of impersonation scenarios that merit discussion:
  • Superuser impersonation (hdfs) - If an HDFS superuser account is compromised, that superuser would have to also have permissions on the EZ Key via Ranger for the user to see any unencrypted data. By default, the hdfs user only gets Generate EEK and Get Metadata privileges on the keys in the KMS. That means that a user who impersonates the hdfs user still won’t be able to decrypt any file data (or even the EDEK stored in the file metadata on the NN).
  • Valid user impersonation - If a valid user account is impersonated, there are multiple authorization checks that need to be passed for the user to gain access to file data. That user would have to have HDFS permissions on the directory/file within the EZ to read the file, and the user would need to have Get Metadata and Decrypt EEK permissions on the EZ key to decrypt the file. If both of those authorizations exist for the compromised user, the attacker would be able to decrypt files within the EZ, but would not have access to the EZ key, nor the KMS master key.

All of the creation, encryption, and decryption of the DEKs is handled within the KMS. The user never sees the key that was used to encrypt the DEK (the EZ key). The user only sees the EDEK, or the DEK. To maintain the integrity of the DEK as it is passed to the user to encrypt or decrypt the file, it is HIGHLY recommended to enable SSL on the KMS with a certificate that is trusted by the DFSClient (well known CA, internal CA trusted by the host, etc.). What becomes the most important to ensure the security of a system are the following:

  • Protect against user account compromise - Use secure passwords, rotate passwords regularly. Use Kerberos security that requires authentication via a reliable mechanism for a user. If Kerberos is not enabled, setting the HDFS_USER_NAME variable means you can impersonate anyone at any time. I’ve had customers say they want to secure their cluster without Kerberos. There is no such thing.
  • Protect the KMS keystore - It is imperative that the KMS keystore is kept secure. The database used as the backing store must be secured, locked down, restricted, firewalled, monitored, audited, and guarded. Period. If the keystore can be compromised, then the EZ keys can be compromised and none of the data is secure.
  • Secure network communication - The transmission of the keys between the KMS and the DFSClient needs to be secure. If it is not, then the DEK will be transmitted in the open when the DFSClient requests the unencrypted key.

Rotating keys for an EZ helps to minimize the impact of a security breach. If a user gains access to the EZ somehow (most likely via a compromise of the KMS backing store or a brute force attack on the EDEK from the NN metadata), then rotating the keys regularly will minimize the exposure area (assuming a single key is compromised and not all of the keys). It is very expensive to rotate the key for all of the data in the EZ because data must be copies out of the EZ and then back into the EZ after the key is rotated to re-encrypt it and generate a new EDEK to store in the NN metadata.

View solution in original post

1 REPLY 1

avatar
You bring up a good point about impersonation attacks. If a user is able to authenticate as (or impersonate) a user, they can gain access to all kinds of data, keys, etc., that they shouldn’t. This is why it is very important to use a reliable authentication mechanism (e.g. Kerberos), require users to change passwords regularly, and use secure passwords. That reduces the chances that impersonation attacks will occur. That being said, there are a couple of impersonation scenarios that merit discussion:
  • Superuser impersonation (hdfs) - If an HDFS superuser account is compromised, that superuser would have to also have permissions on the EZ Key via Ranger for the user to see any unencrypted data. By default, the hdfs user only gets Generate EEK and Get Metadata privileges on the keys in the KMS. That means that a user who impersonates the hdfs user still won’t be able to decrypt any file data (or even the EDEK stored in the file metadata on the NN).
  • Valid user impersonation - If a valid user account is impersonated, there are multiple authorization checks that need to be passed for the user to gain access to file data. That user would have to have HDFS permissions on the directory/file within the EZ to read the file, and the user would need to have Get Metadata and Decrypt EEK permissions on the EZ key to decrypt the file. If both of those authorizations exist for the compromised user, the attacker would be able to decrypt files within the EZ, but would not have access to the EZ key, nor the KMS master key.

All of the creation, encryption, and decryption of the DEKs is handled within the KMS. The user never sees the key that was used to encrypt the DEK (the EZ key). The user only sees the EDEK, or the DEK. To maintain the integrity of the DEK as it is passed to the user to encrypt or decrypt the file, it is HIGHLY recommended to enable SSL on the KMS with a certificate that is trusted by the DFSClient (well known CA, internal CA trusted by the host, etc.). What becomes the most important to ensure the security of a system are the following:

  • Protect against user account compromise - Use secure passwords, rotate passwords regularly. Use Kerberos security that requires authentication via a reliable mechanism for a user. If Kerberos is not enabled, setting the HDFS_USER_NAME variable means you can impersonate anyone at any time. I’ve had customers say they want to secure their cluster without Kerberos. There is no such thing.
  • Protect the KMS keystore - It is imperative that the KMS keystore is kept secure. The database used as the backing store must be secured, locked down, restricted, firewalled, monitored, audited, and guarded. Period. If the keystore can be compromised, then the EZ keys can be compromised and none of the data is secure.
  • Secure network communication - The transmission of the keys between the KMS and the DFSClient needs to be secure. If it is not, then the DEK will be transmitted in the open when the DFSClient requests the unencrypted key.

Rotating keys for an EZ helps to minimize the impact of a security breach. If a user gains access to the EZ somehow (most likely via a compromise of the KMS backing store or a brute force attack on the EDEK from the NN metadata), then rotating the keys regularly will minimize the exposure area (assuming a single key is compromised and not all of the keys). It is very expensive to rotate the key for all of the data in the EZ because data must be copies out of the EZ and then back into the EZ after the key is rotated to re-encrypt it and generate a new EDEK to store in the NN metadata.