Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using Nifi to mask sensitive data

Using Nifi to mask sensitive data

New Contributor

I am a newbie and just read a little on NiFi.

We have a current process where we mask sensitive information in a database and many tables have millions of records. We are first creating a masking table which has the mapping of original and masked values and use this table to mask hundreds of table and also at times using a sequence to scrub numeric field. This process is developed in PL/SQL.

First few Parent tables are masked and then this tables are used to mask the child tables. Using merge or update commands in PL/SQL.

However there has been direction by my client to use NiFi as it can interact with various systems and is more scalable. Also as NiFi can interact with Hadoop and can generate JSON message are another big reasons for this move.

While I am writing this question, I am also reading about NiFi.

However first I need to build PoC to see whatever we are doing right now can be achieved using NiFi. I have few questions, it would be great if someone can give me any pointers on this -

My question:

1. Can I design a similar process - where I can use the already populated masked table and mask all the tables accordingly.

2. Can I pass the tables list as an input parameter to the process

3. Can I restart a process - in case there is any failure during execution

4. Does have any inbuilt Process to handle such requests i.e. doing masking of the sensitive information in tables.

say Name needs to be masked, then this processor should mask say John --> Mark in all the tables which has the Name field.

1 REPLY 1

Re: Using Nifi to mask sensitive data

Super Guru
@Shiv Kabra

I think there might be a confusion of what Nifi does. I also think you are making it more complex then this needs to be. First thing first. There is a ReplaceText processor which you *might* be able to use to mask data by changing data content and replacing them with your masking values. It supports Regular expressions.

Now, since you are new to Nifi, I will try to give you an overview of what Nifi is purpose built for. Nifi is a data flow management tool. It helps creates a data flow in a few minutes without writing a single line of code. Nifi enables you to ingest data from multiple sources using different protocols where data might be in different format and process the data by may be enriching metadata, changing format (for example JSON to Avro), filtering records, track lineage, move data across data centers (cloud and on-prem) securely, send it to different destinations and much more. Companies use Nifi to manage enterprise data flow. Its rich features include queuing (at each processor level), back pressure and lineage.

2. Can I pass the tables list as an input parameter to the process

To do what? Which processor? Check the list of processors below:

https://nifi.apache.org/docs.html

3. Can I restart a process - in case there is any failure during execution

One of the best features of Nifi. When a failure occurs, you can replay records, stop flow at a processor level, make changes and restart it.

4. Does have any inbuilt Process to handle such requests i.e. doing masking of the sensitive information in tables.

I think ReplaceText should do what you are looking for. Nifi is extensible so you can also write your own processor if one of the 200 plus is not enough for you. There is also an executescript processor that you can use to call outside scripts.

Don't have an account?
Coming from Hortonworks? Activate your account here