Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ETL lookup transformation equivalent in Nifi

avatar
Rising Star

I am evaluating Apache Nifi for moving data to Hive instance in HDP. While moving the data to Hive, I am having a requirement to mask/transform some of the data attributes using a lookup table similar to what can be done in traditional ETL lookup transformations. How can I achieve the same in Nifi ?

1 ACCEPTED SOLUTION

avatar
Master Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

View solution in original post

4 REPLIES 4

avatar
Rising Star

did you find anything on this?

avatar
Master Guru

one easy way to do this is to wrap lookups in a REST API and call it as a step. (InvokeHTTP)

another way is to wrap lookups in a command line call and call it as a step (ExecuteStreamCommand)

Another option is with a custom processor

Another option is to create a custom UDF function in Hive that converts data and then run that.

Another option is to do ETL lookup transformations in Spark, Storm, Flink and call via Site-To-Site or Kafka

Load the lookup values into the DistributedMapCache and use them for replacements

Load lookup tables via SQL

ExecuteScript or ExecuteCommand for looking up data to replace

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceTextWit...

to pull mappings from a file created from your tables.

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

Or https://community.hortonworks.com/questions/37733/which-nifi-processor-can-send-a-flowfile-to-a-runn...

Lookup Table Service

https://github.com/aperepel/nifi-csv-bundle/blob/master/nifi-csv-processors/src/main/java/org/apache...

Use HBase for your lookups

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.hbase.GetHBase/

avatar
Rising Star

Hi Timothy,

I am trying to Lookup with Cache method.

Load the lookup values into the DistributedMapCache and use them for replacements

-----------

It doesnt seem to be working for me.

When i try to compare values between 2 flows.. it doesnt compare them.

I have a RouteOnAttribute which uses an Expression like this: ${DEPT_NO:equals(${LKP_DEPT_NO})}

It doesnt send out anything. I checked the UpStream queues. They have correct values.

Can you please suggest how to compare the incoming attributes from 2 flows?

avatar
Rising Star

I tried one more time and it worked for me.. thanks!