Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ETL lookup transformation equivalent in Nifi

avatar
Rising Star

I am evaluating Apache Nifi for moving data to Hive instance in HDP. While moving the data to Hive, I am having a requirement to mask/transform some of the data attributes using a lookup table similar to what can be done in traditional ETL lookup transformations. How can I achieve the same in Nifi ?

1 ACCEPTED SOLUTION

avatar
Master Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

View solution in original post

4 REPLIES 4

avatar
Rising Star

did you find anything on this?

avatar
Master Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

avatar
Rising Star

Hi Timothy,

I am trying to Lookup with Cache method.

Load the lookup values into the DistributedMapCache and use them for replacements

-----------

It doesnt seem to be working for me.

When i try to compare values between 2 flows.. it doesnt compare them.

I have a RouteOnAttribute which uses an Expression like this: ${DEPT_NO:equals(${LKP_DEPT_NO})}

It doesnt send out anything. I checked the UpStream queues. They have correct values.

Can you please suggest how to compare the incoming attributes from 2 flows?

avatar
Rising Star

I tried one more time and it worked for me.. thanks!