Support Questions
Find answers, ask questions, and share your expertise

which ETL Tool would be best to be used with Nifi as Nifi cannot perform Joins on Data? Our staging data is in Redshift, need ETL to perform joins and transformations on data to load Denorm Tables?

@Timothy Spann,@matt burgess : We are using Nifi to Extract Data and prepare staging Layer in Redshift but for Denorm layer we need to use a different tool Matillion for Transformations on Data like Joins etc. Could anyone suggest if we can do Transformations in Nifi itself and send data to denorm directly after extracting from sources in nifi.

Please Help!!


Super Guru

Yes most transformations can be done in NiFi like changing types, dropping columns, changing formats, changing from JSON to AVRO or CSV to JSON. Lots of transformations.

  1. @Timothy Spann,@bryan Bende,@matt Burgess,@Matt clarke: Thanks Timothy for quick reply, but the biggest concern is since joins are not available in Nifi, i am compelled to have a different ETL tool right to load denorm tables joining staging tables? As my Denorm tables contains columns from more than one staging table, and i tried picking columns from individual tables and combining them into one json using Merge Content processor but was not able to do so...Could you please suggest if we have any other approach to join data from two different flowfiles into one to make a single wide row?

I read your answer on joining tables in Nifi at this post

but I am not able to use Merge Content to get a desirable wide row.

I have two sources which outputs json records: record1 has two columns policynumber & xmljoinkey(hash value of a xml) & other json has two columns: id & xmljoinkey(hash value of a xml). I want to join both the jsons on xmljoinkey and output a single json that has three fields xmljoinkey,policynumber,id.

but i am getting output in this manner:

{ "policynumber": "WC2494P2017", "xmljoinkey": "c1b40e1c4c611a4e5ede35176ab3abfdd0f45eb5" }

{ "policynumber": "WC2494P2017", "xmljoinkey": "c1b40e1c4c611a4e5ede35176ab3abfdd0f45eb5" }

{ "policynumber": "WC2494P2017", "xmljoinkey": "c1b40e1c4c611a4e5ede35176ab3abfdd0f45eb5" }

{ "xmljoinkey": "c1b40e1c4c611a4e5ede35176ab3abfdd0f45eb5", "transactionid": "1234" }

{ "xmljoinkey": "c1b40e1c4c611a4e5ede35176ab3abfdd0f45eb5", "transactionid": "5647" }

Could you please suggest some approach on this?

Thanks in advance!!

I read the answer by Bryan Bende for Joins in nifi:

so i think i might not be able to do this in Nifi , still if anyone finds any way doing this, please share.

Thanks a lot 🙂


@srijitachaturve Based on my experience of evaluating ETL tools, I have found the below tools to be helpful while joining data across tables.

  1. Panoply: Panoply is the main cloud ETL supplier and data warehouse blend. With 100+ data connectors, ETL and data ingestion is quick and simple, with only a couple of snaps and a login among you and your recently coordinated data. In the engine, Panoply is really utilizing an ELT approach (as opposed to customary ETL), which makes data ingestion a lot quicker and increasingly powerful, since you don't need to trust that change will finish before stacking your data. What's more, since Panoply constructs oversaw cloud data warehouses for each client, you won't have to set up a different goal to store all the data you pull in utilizing Panoply's ELT procedure. On the off chance that you'd preferably utilize Panoply's rich arrangement of data authorities to set up ETL pipelines into a current data warehouse, Panoply can likewise oversee ETL forms for your Azure SQL Data Warehouse.
  2. Atom: Atom, from ironSource, is a data pipeline platform that permits data gushing in close to ongoing, into a data warehouse. Atom empowers data stream customization, in view of necessities that help oversee data all the more proficiently. Atom's change code is written in Python, which helps transform crude logs into queryable fields and bits of knowledge. It gives an assortment layer, which supports sending data from any source and in any configuration to show up to the objective data storehouse close to constant. Atom likewise has delay and play alternatives. A solid resumption of data stream without losing a solitary occasion is a significant capacity of Atom, as far as keeping up data trustworthiness.
  3. Sprinkle Data: Sprinkle is a SaaS platform providing ETL tool for organisations.Their easy to use UX and code free mode of operations makes it easy for technical and non technical users to ingest data from multiple data sources and drive real time insights on the data. Their Free Trial enables users to first try the platform and then pay if it fulfils the requirement.
; ;