Created 11-18-2016 04:40 PM
In NiFi, how do I put data to HDFS with the data encrypted across the wire? NiFi cluster would be on separate cluster than HDFS.
Created 11-18-2016 06:06 PM
is this like an on prem to cloud kind of a situation. Why would you want to just encrypt specific flows? For encryption to happen, the server has to enable it. ex. https goes to 443 instead of 80. Enable hadoop wire encryption and try sending data without setting the encryption settings in the hdfs-site when you call puthdfs. I think it will keep both secure and unsecure channels open.
Created 11-18-2016 05:05 PM
I think if you setup the cluster to use wire encryption, the name node and hdfs client will handle it for you. you just need to have the updated hdfs-site.xml file available on NiFi
Created 11-18-2016 05:29 PM
I believe that is a cluster-wide setting for all client interactions with the the namenode/HDFS. I was hoping to isolate the encryption to specific flows on the NiFi side. Thoughts?
Created 11-18-2016 06:11 PM
saw this in the pdfs-default xml
dfs.encrypt.data.transfer | false | Whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NN and DNs, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class. |
i guess it is doable.
Created 11-18-2016 06:17 PM
so you would have to create a class that extends TrustedChannelResolver and overrides the isTrusted() method. then set it in hdfs properties dfs.trustedchannel.resolver.class. I have not handled this before, but if you are up to it i can assist.
Created 11-18-2016 06:06 PM
is this like an on prem to cloud kind of a situation. Why would you want to just encrypt specific flows? For encryption to happen, the server has to enable it. ex. https goes to 443 instead of 80. Enable hadoop wire encryption and try sending data without setting the encryption settings in the hdfs-site when you call puthdfs. I think it will keep both secure and unsecure channels open.
Created 11-18-2016 06:45 PM
I was hoping to be granular with encryption of sensitive data vs non-sensitive data flowing into HDFS for performance reasons. If performance differences are not that large ... it is no big deal, then.