Support Questions

Find answers, ask questions, and share your expertise

HDFS Data at Rest Encryption with Ranger KMS - need clarifiations

avatar
Explorer

Hello All,

we are planning to implement HDFS "Data at Rest" Encryption (Ranger KMS) on our datalake and have few below questions before we implement, can anyone help with these questions,

  1. As we are putting encryption layer on data, Does it impact performance while data read and write to HDFS ?
  2. Can we choose which data to be encrypted and which are not, after implementing ( we want to implement encryption on only few datasets, not all) ?

Thanks in advance

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Vinu

HDFS "Data at Rest" Encryption

Hadoop provides several ways to encrypt stored data.

  • volume encryption
  • Application level encryption
  • HDFS data at rest encryption

The last approach uses specially designated HDFS directories known as "encryption zones." simply a special HDFS directory within which all data is encrypted upon write, and decrypted upon read.

You can have multiple encryption zones with this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strong as the table.

HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. Cloud data access server-side encryption slightly slows down performance when reading data from S3, both in the reading of data during the execution of a query and in scanning the files prior to the actual scheduling of work.

You can run two Hadoop performance tests, TestDFSIO and TeraSort, to measure performance in different encryption zones. TestDFSIO is more storage I/O- and throughput-focused, while TeraSort is representative of running a workload that is not only I/O- but also CPU-intensive. Both of these tests use the Hadoop distributed file system (HDFS). Ran these tests to compare encrypted data in different configurations but all also depends on your hardware eg Using E5-2699 v3 compared to Xeon E5-2697 v2 processors results in a significant increase in performance during test scenarios.

Reference Data at rest encryption

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Vinu

HDFS "Data at Rest" Encryption

Hadoop provides several ways to encrypt stored data.

  • volume encryption
  • Application level encryption
  • HDFS data at rest encryption

The last approach uses specially designated HDFS directories known as "encryption zones." simply a special HDFS directory within which all data is encrypted upon write, and decrypted upon read.

You can have multiple encryption zones with this configuration, you can use encrypted databases or tables with different encryption keys. To read data from read-only encrypted tables, users must have access to a temporary directory that is encrypted at least as strong as the table.

HDFS encryption is able to provide good performance and existing Hadoop applications are able to run transparently on encrypted data. Cloud data access server-side encryption slightly slows down performance when reading data from S3, both in the reading of data during the execution of a query and in scanning the files prior to the actual scheduling of work.

You can run two Hadoop performance tests, TestDFSIO and TeraSort, to measure performance in different encryption zones. TestDFSIO is more storage I/O- and throughput-focused, while TeraSort is representative of running a workload that is not only I/O- but also CPU-intensive. Both of these tests use the Hadoop distributed file system (HDFS). Ran these tests to compare encrypted data in different configurations but all also depends on your hardware eg Using E5-2699 v3 compared to Xeon E5-2697 v2 processors results in a significant increase in performance during test scenarios.

Reference Data at rest encryption

avatar
Explorer

@Geoffrey Shelton Okot : Thanks for your detailed explanation, this helps alot

avatar
Master Mentor

@Vinu

Nice to know the explanation helped. but also the best way to master is trying it out. Please could you take some time an"Accept " my response so other HCC member could easily reference it