Support Questions

Find answers, ask questions, and share your expertise

HDFS encryprion at rest


I am going through how to encrypt data at rest on HDFS

I have one doubt here:

As while reading the file from HDFS, the namenode passes the EDEK to the client. The client then passes the EEDK to KMS for getting DEK and then gets the blocks of that file from data node and decrypt them using DEK.

So it can be possible that in between somebody sniffs and gets the DEK, which is coming from KMS. And then also sniffs the data blocks coming from data node. Then both can be used to decrypt data by Man in the middle. So how is this taken care of? What type of communication is b/w:

1. KMS and the client.

2. Namenode and the KMS.

so that no one in between can compromise the keys?



In a nutshell, communication in cases 1 and 2 are encrypted as depicted in the below architecture, I would encourage you to read Transparent Data encryption document have a better understanding.

HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.



@Geoffrey Shelton Okot Thanks for the reply. I have gone through the document mentioned above. But i could not understand how the cummunication b/w client and KMS and namenode and KMS is made secure. Could you please help with that.



To ensure security you will need SSL and Kerberos, without the latter you don't have any authentication, hence no real security. Even if you encrypt the data, there's nothing to stop anyone talking to the cluster claiming to be the administrative user hence with a possible hack.