Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to have a look at the data present in encryption zone (Encrypted data)

how to have a look at the data present in encryption zone (Encrypted data)

New Contributor

I have recently started working on encryption zone and for the demo purpose I want to show the encrypted data just to showcase that the data in encryption zone is actually encrypted.

If there any API or anything I can use to do this, please share.

1 REPLY 1
Highlighted

Re: how to have a look at the data present in encryption zone (Encrypted data)

@sachin gupta

To Protect the data in motion, it is required to understand the underlying protocol that is used when data is transferred over the network in Hadoop. A Hadoop client connects to NameNode using the Hadoop RPC protocol over TCP, while the Hadoop client transfers the data to DataNode using the HTTP protocol over TCP. User authentication to NameNode and JobTracker services is through Hadoop’s remote procedure call using the SASL framework,Kerberos is used as the authentication protocol to authenticate the users within SASL. The SASL authentication framework can be used to encrypt the data while it is being imported into the Hadoop ecosystem thus protecting the data in motion. SASL security guarantees that data exchanged between the client and servers is encrypted and is not readable by a ‘‘man in the middle’’. SASL encryption can be enabled by configuring the property hadoop.rpc.protection to privacy in core-site.xml. This ensures that the communication between the Hadoop client and NameNode is secured and encrypted.Any Hadoop client requesting for data from HDFS needs to fetch the data blocks directly from DataNode after it fetches the block ID from NameNode. The Block Access Token (BAT) can be used to ensure that only authorized users are able to access the data blocks stored in DataNodes.

ENCRYPTION FOR DATA AT REST

An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read. Each encryption zone is associated with a single encryption zone key which is specified when the zone is created. Each file within an encryption zone has its own unique data encryption key (DEK). DEKs are never handled directly by HDFS. Instead, HDFS only ever handles an encrypted data encryption key (EDEK). Clients decrypt an EDEK, and then use the subsequent DEK to read and write data. HDFS DataNodes simply see a stream of encrypted bytes. A new cluster service is required to manage encryption keys: the Hadoop Key Management Server (KMS). In the context of HDFS encryption,

the KMS performs three basic responsibilities:

1) Providing access to stored encryption zone keys.

2) Generating new encrypted data encryption keys for storage on the NameNode.

3) Decrypting encrypted data encryption keys for use by HDFS clients

you may also check this:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html