Reply
Explorer
Posts: 7
Registered: ‎12-30-2015

HDFS Encryption over wire

I have been reading about encrypting data over wire while reading/writing from/to HDFS.

 

I found that to enable data encryption over wire for HDFS, user needs to do following:

To enable encryption of data transfered between DataNodes and clients, and among DataNodes, proceed as follows:

  1. Enable Hadoop security using Kerberos.
  2. Select the HDFS service.
  3. Click the Configuration tab.
  4. Expand the Service-Wide category and click the Security subcategory. Configure the following properties:

Property Description

Enable Data Transfer EncryptionCheck this field to enable wire encryption.
Data Transfer Encryption AlgorithmOptionally configure the algorithm used to encrypt data.
Hadoop RPC Protection

Select privacy.

 

I am trying to understand that is it possible to do encryption over network without using Kerberos.

Assume i do not have kerberos enabled for cluster or client application, and i still enable Data Transfer Encryption via dfs.encrypt.data.transfer to true in the hdfs-site.xml.

 

This document does not clarify why whether to use any key file on client machine or not. I am trying to understand how client application knows how to encrypt/decrypt data without knowledge of public key. I might be missing something here, but atleast this part of document does not talk about generating key and copying it to client machine or using it somehow while connecting to cluster.

 

Can someone please point if there are any further details in document or somewhere else on how to use encryption over wire step by step?