Created 04-08-2018 05:28 AM
Hi there,
Fairly typical data flow requirement. CSV file, insert into mariadb staging table, do stuff to clean, import into permanent table.
Got all of that working, except the bit where it needs to truncate the staging table before doing the insert.
Current process is GenerateFlowData (for testing purposes) -> PutDatabaseRecord -> PutSQL (to run the post insert)
I have an Execute SQL ready to go with a TRUNCATE file statement in it. I've tried inserting it between the Generate and the PutDatabaseRecord, joining on failure, but I'm not getting any data through it.
Any ideas?
Created on 04-14-2018 03:41 AM - edited 08-17-2019 10:05 PM
If you are using NiFi1.5+ then you can use PutSQL processor with SQL Statement property as your Truncate statement,as this processor won't change the contents of flowfile, so we can use PutDatabaseRecord processor to prepare SQL statements and finally use PutSQL processor for post insert.
jira addressing SQL statement property in PutSQL processor,
https://issues.apache.org/jira/browse/NIFI-4522
Flow:-
If you are running Prior version of NiFi1.5 then use executescript processor to run truncate statement on the target database and then use PutDatabaseRecord and PutSQL processors. Please refer to this link for more details.
.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 04-14-2018 07:40 PM
If you are having more than one file to process then use Merge Content processor after GetFile processor,Merge Content processor merges more than one file into one file.
Flow:-
Get File --> MergeContent -->Truncate Table --> insert csv into Table --> clean up
by using merge content processor we are processing one file at a time even though you are having more than one file. PutDatabaseRecord processor(all record based processors) are pretty powerful in NiFi which can handle millions of records.
Please refer to below links to know how to configure merge content processor
https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html
https://community.hortonworks.com/questions/161827/mergeprocessor-nifi-using-the-correlation-attribu...
https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.ht...
in addition there are wait and notify processors in NiFi, which Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor. When a matching release signal is identified, a waiting FlowFile is routed to the 'success' relationship.
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/
Created 04-14-2018 07:04 PM
Thanks Shu - difficulty with the solution as written is if I have more than one file to process. I need to:
get file -> truncate table -> insert csv into table -> clean up
... all in one "transaction" as it were... so the truncation triggered by the second file doesn't happen until after the clean up for the first. Considering using some sort of semaphoring table... thoughts?