Support Questions

Find answers, ask questions, and share your expertise

NiFi: Encrypt all flowfiles on disk during full flow

avatar
Guru

I want to encrypt all flow files on disk in the the NiFi cluster. Let's say I GetSFTP, do a flow with many processors, then PutSFTP.

  1. Do I simply place an EncryptContent (to encrypt) after the GetSFTP and then another (to decrypt) before the PutSFTP?
  2. If so, won't the data be unencrypted on disc between GetSFTP - EncryptContent and between EncryptContent-PutSFTP?
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Yes it would be unencrypted on disc after the getSFTP. would os level encryption be an option? Then it will always be encrypted throughout.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Yes it would be unencrypted on disc after the getSFTP. would os level encryption be an option? Then it will always be encrypted throughout.

avatar
Guru

Thank you @Karthik Narayanan I am wondering what the best practices are, especially since NiFi was built by the NSA. How would they recommend? (Not sure if you can answer this, but I would have thought on disk encryption would be a more straightforward implementation).

avatar
Super Collaborator

i am not sure wha the answer to that is. But , since nifi tracks provenance and lineage, it keeps a copy of the unencrypted file as it is the content of the original flow file , then when encrypted a new filefile is created with the encrypted content. When you decrypt, it reverts back to the original flow file and content. Basically encryption is something we did on a original file and needs to be tracked. May be some one from NiFi team can give a better answer

avatar
Guru

I was able to find out that FlowFiles on disk are stored in a (human unreadable) binary format and thus there is little need to encrypt. They do appear as readable to someone viewing provenance -- but this can either be switched off or locked down by user role.

avatar

@Karthik is correct that the provenance, content, and flowfile repositories are stored on disk unencrypted. Current recommendations are to restrict access to said repositories using OS-level access control (e.g. POSIX) and to use encrypted storage volumes.

There is an existing security feature roadmap entry for transparent data encryption of the various repositories so that the values are never written to the file system in an unencrypted form. Obviously there are performance implications to take into consideration when developing this feature and an admin choosing to enable it.

Just because the repository format on disk is "human unreadable" binary does not preclude the security concerns here -- an arbitrary process with OS permission can read those files, and the serialization logic is open source.