Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Nifi joins using ExecuteSQL for larger tables

New Contributor

I am trying to Join multiple tables using NiFi. The datasource may be MySQL or RedShift maybe something else in future. Currently, I am using ExecuteSQL processor for this but the output is in a Single flowfile. Hence, for terabyte of data, this may not be suitable. I have also tried using generateTableFetch but this doesn't have join option.

Here are my Questions:

  1. Is there any alternative for ExecuteSQL processor?
  2. Is there a way to make ExecuteSQL processor output in multiple flowfiles? Currently I can split the output of ExecuteSQL using SplitAvro processor. But I want ExecuteSQL itself splitting the output
  3. GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger?

    Please share your thoughts. Thanks in advance


Cloudera Employee

Nifi is not the best tool to do table joins. It will be able to do some transformations, but the heavy lifting should be done with spark or storm (if real time is needed).

Have a look at this thread: