Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

ETL lookup transformation equivalent in Nifi

Explorer

I am evaluating Apache Nifi for moving data to Hive instance in HDP. While moving the data to Hive, I am having a requirement to mask/transform some of the data attributes using a lookup table similar to what can be done in traditional ETL lookup transformations. How can I achieve the same in Nifi ?

1 ACCEPTED SOLUTION

Super Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

View solution in original post

4 REPLIES 4

did you find anything on this?

Super Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

Hi Timothy,

I am trying to Lookup with Cache method.

Load the lookup values into the DistributedMapCache and use them for replacements

-----------

It doesnt seem to be working for me.

When i try to compare values between 2 flows.. it doesnt compare them.

I have a RouteOnAttribute which uses an Expression like this: ${DEPT_NO:equals(${LKP_DEPT_NO})}

It doesnt send out anything. I checked the UpStream queues. They have correct values.

Can you please suggest how to compare the incoming attributes from 2 flows?

I tried one more time and it worked for me.. thanks!