Support Questions

joshua_adeleke · ‎01-09-2017

I have a flow diagram that picks zipped csv files and loads into a table in Oracle 12c. This has been tested successfully for 20 records but when i try it with a larger csv file (over 40k records), i get an error message "Failed to update DB due to a failed batch update. There were a total of 1 flowfiles that failed, 0 that succeeded and 99 that were not executed and will be routed to retry;". I currently have the batch size set to 100 for the PutSQL processor. Is there any setting i am missing?

GetSFTP---UnpackContent---SplitText---ExtractText---ReplaceText---PutSQL

MattWho · ‎01-09-2017

@Joshua Adeleke

Also check your NiFi app log for any Out Of Memory Errors (OOME). The SplitText processor may be having memory issues trying to split over 40k records. You could try using two splitText processors in series with the first splitting on a 10,000 "Line Split Count" and the second then splitting those 10,000 line FlowFiles with a 1 "Line Split Count". This will greatly reduce the heap memory footprint.

In addition, if you do a listing on the queue feeding your putSQL processor, do you see any listed FlowFiles with an unexpected size?

Matt

View solution in original post

mburgess · ‎01-09-2017

Can you check the logs (in logs/nifi-app.log) to see what the error was? Also, what is the content of the first flow file to go into PutSQL (that is being routed to failure)? Does that SQL statement work when executed from something like SQL Developer?

joshua_adeleke · ‎01-10-2017

Hi @Matt Burgess, the first SQL statement is the issue. I see it's taking the headers in the csv file even though i skipped the headers with the header line count parameter in SplitText.

2017-01-10 12:07:13,538 ERROR [Timer-Driven Process Thread-40] o.apache.nifi.processors.standard.PutSQL PutSQL[id=79bf32ff-e154-1a02-b109-ebd298dfab2e] Failed to update database due to a failed batch update. There were a total of 1 FlowFiles that failed, 0 that succeeded, and 99 that were not execute and will be routed to retry;

MattWho · ‎01-09-2017

@Joshua Adeleke

Also check your NiFi app log for any Out Of Memory Errors (OOME). The SplitText processor may be having memory issues trying to split over 40k records. You could try using two splitText processors in series with the first splitting on a 10,000 "Line Split Count" and the second then splitting those 10,000 line FlowFiles with a 1 "Line Split Count". This will greatly reduce the heap memory footprint.

In addition, if you do a listing on the queue feeding your putSQL processor, do you see any listed FlowFiles with an unexpected size?

Matt

joshua_adeleke · ‎01-10-2017

Hi @Matt, there is no OOME error in the NiFi app log. Also, the SplitText processor is able to successfully split the files but the issue seems to be at the PutSQL processor. I can see that although i asked that SplitText processor skip 2 lines in the header line count parameter, the insert statements into the PutSQL have the csv file headers as the values.

MattWho · ‎01-10-2017

@Joshua Adeleke

You could extract the header bits from the first two lines into FlowFile attributes before the SplitText processor. All the FlowFiles that come out of the SplitText processor will all get these new FlowFile attributes as well. You can then use the FlowFile Attributes in your PutSQL.

joshua_adeleke · ‎01-10-2017

@Matt

Please how do i extract the headers without using the SplitText processor?

MattWho · ‎01-10-2017

@Joshua Adeleke

The SplitText processor simply splits the content of an incoming FlowFile into multiple FlowFiles. It gives you the ability to designate how many lines would be considered the header and ignored, but it does no extraction of content in to FlowFile attributes.

The ExtractText processor can be used to read parts of the content and assign those parts to different NiFi FlowFile attributes. It will not remove the header form the content, that would still be done during the splitText processor operation. However, every FlowFile created by SplitText will inherit the unique FlowFile attributes from the parent FlowFile.

Matt

MattWho · ‎01-10-2017

@Joshua Adeleke

If you found this information helpful in guiding you with your dataflow design, please accept the answer.

joshua_adeleke · ‎01-11-2017

@Matt

I have not found a way to remove the headers in the csv file.

Cloudera Community

Support Questions

Loading csv files into Oracle DB with NiFi