Created 04-21-2017 07:44 PM
Hello,
we are trying to use NiFi to ingest 3 data files and then join them based on certain values.
This would be very straight forward for a Spark (RDD/DataFrame), wondering if i can do this using NiFi as well ?
Created 04-21-2017 07:47 PM
Join them in what way? binary concatenation? tar? etc...? This is likely something that can easily be done using NiFi's mergeContent processor.
Created 04-21-2017 07:50 PM
sorry meant like a SQL join... like a spark RDD
Created 04-24-2017 02:00 PM
Hello Mat,
just confirming if joining is an option in NiFi or should we go with Spark for this use case...
Created 04-21-2017 07:50 PM
Created 04-21-2017 08:07 PM
we are getting 3 sets of files, we want to do bunch of business validation before ingesting/processing them further.... that requires sql-joins between these 3 files.
we know how to solve that problem using SPARK-SQL not sure how to solution with NiFi.
Created 04-24-2017 02:00 PM
Hello Wynner,
just confirming if joining is an option in NiFi or should we go with Spark for this solution...
Created 04-21-2017 07:54 PM
we are getting 3 sets of files, we want to do bunch of business validation before ingesting/processing them further.... that requires sql-joins between these 3 files.
we know how to solve that problem using SPARK-SQL not sure how to solution with NiFi.
Created 05-24-2017 05:12 PM
This is not currently possible inside NiFi (without scripting pretty much the entire capability), but with the Record Reader/Writer capabilities added in NiFi 1.2.0, a JoinRecord processor could be possible, as long as each incoming flow file had a schema associated with it. One tricky part with a data "flow" is knowing that you have three files, and they are the (only) three files you want. Usually a flow will have any number of files coming in at any time. In this case such a JoinRecord processor would have to be configurable to wait for N flow files and assume they can all be joined.
In future releases of NiFi, you should have more options as more LookupService implementations are added.
In the meantime, you might consider using Presto, you could set up a DBCPConnectionPool and SQL processor (such as ExecuteSQL) from inside NiFi to use the Presto JDBC driver, and execute the JOIN(s) against files on the filesystem (using a LocalFileConnector, e.g.)
Created 05-24-2017 05:46 PM
Thank You @Matt Burgess , yes pretty much did what you suggested, highly customized processor/s for this feature. Created 2 queues (FILE_READY and FILE_NOT_READY). And we only read from FILE_READY...
Created 11-29-2017 06:34 AM
Hi @Matt Burgess, I am looking for same functionality described above (i.e to join files containing records from three different tables to be joined on a common field and get a wide row). Please let us know if you are Newer versions of Nifi support processors for this activity?
Thanks Sri