Support Questions

Find answers, ask questions, and share your expertise

Can we move file from TDE encrypt zone?

avatar
Expert Contributor

I am new to TDE and one of our customers would like to know the following:

What happens when an encrypted file is moved from encrypted zone to another location on HDFS? can we still decrypt and re-encrypt that file using the same key? or we can't decrypt that file once it is moved from its encrypted zone location.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@rbalam

You can move file from TDE zone to another location as long as you have keys/access to read file.

https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Acc...

When creating a new file in an encryption zone, the NameNode asks the KMS to generate a new EDEK encrypted with the encryption zone’s key. The EDEK is then stored persistently as part of the file’s metadata on the NameNode.

When reading a file within an encryption zone, the NameNode provides the client with the file’s EDEK and the encryption zone key version used to encrypt the EDEK. The client then asks the KMS to decrypt the EDEK, which involves checking that the client has permission to access the encryption zone key version. Assuming that is successful, the client uses the DEK to decrypt the file’s contents.

All of the above steps for the read and write path happen automatically through interactions between the DFSClient, the NameNode, and the KMS.

Access to encrypted file data and metadata is controlled by normal HDFS filesystem permissions. This means that if HDFS is compromised (for example, by gaining unauthorized access to an HDFS superuser account), a malicious user only gains access to ciphertext and encrypted keys. However, since access to encryption zone keys is controlled by a separate set of permissions on the KMS and key store, this does not pose a security threat.

View solution in original post

4 REPLIES 4

avatar
Master Mentor
@rbalam

You can move file from TDE zone to another location as long as you have keys/access to read file.

https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Acc...

When creating a new file in an encryption zone, the NameNode asks the KMS to generate a new EDEK encrypted with the encryption zone’s key. The EDEK is then stored persistently as part of the file’s metadata on the NameNode.

When reading a file within an encryption zone, the NameNode provides the client with the file’s EDEK and the encryption zone key version used to encrypt the EDEK. The client then asks the KMS to decrypt the EDEK, which involves checking that the client has permission to access the encryption zone key version. Assuming that is successful, the client uses the DEK to decrypt the file’s contents.

All of the above steps for the read and write path happen automatically through interactions between the DFSClient, the NameNode, and the KMS.

Access to encrypted file data and metadata is controlled by normal HDFS filesystem permissions. This means that if HDFS is compromised (for example, by gaining unauthorized access to an HDFS superuser account), a malicious user only gains access to ciphertext and encrypted keys. However, since access to encryption zone keys is controlled by a separate set of permissions on the KMS and key store, this does not pose a security threat.

avatar
Master Guru

Now people with more knowledge about TDE might correct me but I don't see anything about cp/mv in/out of an encrypted zone only get/put. Of course you can use MapReduce to read from it and write somewhere else to have non-encrypted or write in an encrypted zone to have encrypted data but not sure if you can use a simple hadoop fs -cp or -mv to do the same. Anybody with experience in TDE knows?

What you can do is use the hidden folder /.reserved/raw/ to access the encrypted data directly. To for example copy it to a backup server without having to en/decrypt anything.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/config-use-hdfs-...

"To retain this workflow when using HDFS encryption, a new virtual path prefix has been introduced, /.reserved/raw/. This virtual path gives super users direct access to the underlying encrypted block data in the file system, allowing super users to distcp data without requiring access to encryption keys. This also avoids the overhead of decrypting and re-encrypting data. The source and destination data will be byte-for-byte identical, which would not be true if the data were re-encrypted with a new EDEK."

avatar
Expert Contributor

Thats interesting .. can someone confirm that we can't use cp or mv on encrypted zone files?