Created 02-26-2018 04:17 PM
Hi, guys,
I have a data flow in Nifi that gets a file from the server and converts it to Avro to stream data to Hive. In this flow, I have some sensitive information that I need to hash (SHA2_512). I checked that Nifi has a couple of processors to work with hash, but it seems they only do this for the whole file. Is there a way to hash a specific field? Before converting to Avro, my flow files are coming from the server as fields separated by pipes ('|').
Thanks in advance!
Cheers
Created 02-26-2018 04:40 PM
For algorithms that don't require secret information, I think it could be helpful to add such functions to NiFi Expression Language, although they don't exist right now (please feel free to file a Jira to add such a capability). With this capability you could use UpdateRecord to replace a field with its hashed value.
In the meantime, if you are comfortable with a scripting language such as Groovy, Javascript, Jython, JRuby, Clojure, or Lua, you could write a ScriptedLookupService that can "lookup" a specified field by returning the hash of its value. Then you can use LookupRecord with that service to get the hashed values. Alternatively you could implement a ScriptedRecordSetWriter that is configured to hash the values of the specified field(s), and then ConvertRecord with that writer.
Created 02-26-2018 04:40 PM
For algorithms that don't require secret information, I think it could be helpful to add such functions to NiFi Expression Language, although they don't exist right now (please feel free to file a Jira to add such a capability). With this capability you could use UpdateRecord to replace a field with its hashed value.
In the meantime, if you are comfortable with a scripting language such as Groovy, Javascript, Jython, JRuby, Clojure, or Lua, you could write a ScriptedLookupService that can "lookup" a specified field by returning the hash of its value. Then you can use LookupRecord with that service to get the hashed values. Alternatively you could implement a ScriptedRecordSetWriter that is configured to hash the values of the specified field(s), and then ConvertRecord with that writer.
Created 11-23-2019 08:51 AM
I did explore the way to hash and dynamically can extract data as csv or avro as default by developing a custom processor. You can download the processor from HERE.
The full source code is shared from here, feel free to contribute for further functionalities.
https://github.com/vanducng/hashing-columns-nifi-processor