question Re: How to split the output flow file from ExecuteSQL (query perform joins)? in Support Questions

How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Thu, 03 Sep 2020 14:31:15 GMT

Hello.
I'm using an ExecuteSQL Processor to extract data from Oracle DB. The query has multiple joins and returns a large number of fields.

The problem is that ExecuteSQL Processor returns a single flow file (avro format) that is huge.

I want to split the flow file (based in a number of rows, for example) and then merge them at the proper moment.

I have read about QueryDatabaseTable and GenerateFetchTable which are processors that can split the output flow files but looks like these processors aren't able to perform joins..

Does anyone knows a workaround?

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

Faerballert — Thu, 03 Sep 2020 19:10:27 GMT

At ExecuteSQL Processor is a property named by "Max Rows Per Flow File".

There you can set how much rows each Flow File should be contain and later you can merge them like you wanted cause the flow files get an fragment attribute.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Thu, 03 Sep 2020 20:21:18 GMT

There is no such a option at ExecuteSQL..

Perhaps you're using a newer version of NiFi: Mine is 1.5.0.3.1.2.0-7

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

SagarKanani — Fri, 04 Sep 2020 07:12:52 GMT

You should use ListDatabaseTable and generatetablefetch to perform an incremental load. If you are joining the tables, you can do a replacetext after generatetablefetch to add the join query and then feed the flowfile to execute sql. You can split the amount of data in generatetablefetch.

You can use splitrecord / splitcontent to split the single avro to multiple smaller files and then use mergecontent to merge them back if required.

Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Fri, 04 Sep 2020 13:56:58 GMT

I don't know if these solutions works for me.

What I really wanna do is make ExecuteSQL work as (for example) SelectHiveQL, which means, ExecuteSQL get only one incoming flow file and sends forward multiple flows file that can be merged.

My real problem is that the ExecuteSQL query sometimes needs to produce a flowfile whose size is too large for the edge machine to process, what ends in error.. so, I need to split the flow file to decrease the pressure on the edge machine...
I've been through the same situation in Hive queries, but I solved using the "Max Rows Per Flow File" property and then merging the flowfiles.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

SagarKanani — Mon, 07 Sep 2020 11:33:56 GMT

Why not use the 2nd option i said above....Use splitcontent or splitrecord and then merge it later whenever you want it.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Tue, 08 Sep 2020 14:17:14 GMT

My point is, the error occurs when the ExecuteSQL is running. The cause is: ExecuteSQL needs to create a huge flow file that the edge machine doest not have enough processing power to create.. Your solutions looks good, but it would split the flow file after it was produced by the ExecuteSQL, your solution takes action in a moment when the error has already occurred.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

SagarKanani — Tue, 08 Sep 2020 14:34:37 GMT

Then im afraid its difficult to do so. I dont understand how you are feeding the queries to execute sql. Maybe its good to feed executesql with manageable queries. If you are using GenerateTableFetch then it allows you to break a big query into smaller queries like you want and feed it to ExecuteSQL. Hope this helps. Please do post back on how to managed to move forward.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Tue, 08 Sep 2020 14:45:30 GMT

The process is triggered by a GetSFTP that retrieves a date file.. Then the query uses the date for filtering..

Thanks for your contribution.. As soon as I make progress I'll update you guys.

Re: How to split the output flow file from ExecuteSQL (query perform joins)?

CaioFalco — Thu, 10 Sep 2020 19:54:01 GMT

Maybe I have found a solution..

I'm gonna use the ExecuteSQL to do a "select insert" query.. The query will perform the joins and load the data into a table. Then the QueryDatabaseTable will read from the new table.. That way I'll be able to use the "Max Rows Per Floe File" property.