Created 11-18-2016 09:44 PM
I want to encrypt all flow files on disk in the the NiFi cluster. Let's say I GetSFTP, do a flow with many processors, then PutSFTP.
Created 11-18-2016 09:47 PM
Yes it would be unencrypted on disc after the getSFTP. would os level encryption be an option? Then it will always be encrypted throughout.
Created 11-18-2016 09:47 PM
Yes it would be unencrypted on disc after the getSFTP. would os level encryption be an option? Then it will always be encrypted throughout.
Created 11-18-2016 10:08 PM
Thank you @Karthik Narayanan I am wondering what the best practices are, especially since NiFi was built by the NSA. How would they recommend? (Not sure if you can answer this, but I would have thought on disk encryption would be a more straightforward implementation).
Created 11-18-2016 10:16 PM
i am not sure wha the answer to that is. But , since nifi tracks provenance and lineage, it keeps a copy of the unencrypted file as it is the content of the original flow file , then when encrypted a new filefile is created with the encrypted content. When you decrypt, it reverts back to the original flow file and content. Basically encryption is something we did on a original file and needs to be tracked. May be some one from NiFi team can give a better answer
Created 11-19-2016 05:42 PM
I was able to find out that FlowFiles on disk are stored in a (human unreadable) binary format and thus there is little need to encrypt. They do appear as readable to someone viewing provenance -- but this can either be switched off or locked down by user role.
Created 11-20-2016 04:03 AM
@Karthik is correct that the provenance, content, and flowfile repositories are stored on disk unencrypted. Current recommendations are to restrict access to said repositories using OS-level access control (e.g. POSIX) and to use encrypted storage volumes.
There is an existing security feature roadmap entry for transparent data encryption of the various repositories so that the values are never written to the file system in an unencrypted form. Obviously there are performance implications to take into consideration when developing this feature and an admin choosing to enable it.
Just because the repository format on disk is "human unreadable" binary does not preclude the security concerns here -- an arbitrary process with OS permission can read those files, and the serialization logic is open source.