Created 10-25-2024 02:46 PM
I am running into an issue where I am unable to successfully use a PutSQL or ExecuteSQL after returning from a Batch Processor group.
The general scenario is this: I need to watch a folder for files that will be created with information to be uploaded to a PostgreSQL database. The files will have multiple lines, each line is to be inserted as a separate record into a temporary/staging table. When all the records from that file have been uploaded, I need to call a stored procedure that will process all those records and update other permanent tables all at once.
I came across the concept of using a processor group set with 'Process Group Outbound Policy' set to 'Batch Output'.
So, I am able to use GetFile to get each file as it is produced, then pass it into my batch processor group, where it splits the lines into individual flow files, and performs the inserts into my staging table. That all works perfectly.
I then use a MergeContent processor to reduce those split files back into a single flow file, so I can subsequently use either ExecuteSQL or PutSQL to call my stored procedure - once.
But I then receive an error stating, "Failed to update database for FlowFile[filename=MYFILE.TXT] due to The column index is out of range: 1, number of columns: 0.; it is possible that retrying the operation will succeed, so routing to retry: org.postgresql.util.PSQLException: The column index is out of range: 1, number of columns: 0."
The strange thing is that it doesn't matter what SQL statement I am trying to use, whether a 'CALL' to my stored procedure, or a SELECT statement to any table, I get the exact same error. I also tried a test to immediately use a PutSQL to insert records to another test table, after returning from the process group, prior to the MergeContent processor, so that's not the issue either.
Even more strange is that I can take the same SQL processor(s) and wire it into the flow prior to the batch process group, and it works just fine.
I have other SQL-related processors at the main level, plus the PutSQL processor within the Processor Group that performs each individual insert, and they all work fine. And they all use the same JDBC Connection Pool.
So what's the issue? Do I have something misconfigured? Is something not getting passed in and out of the Process Group?
Or, considering I'm am using NiFi version 2.0.0-M4, is there a bug somewhere?
Thanks.
Created 10-29-2024 03:05 AM
Hi @rturn ,
Its an interesting case indeed. I'm not sure if this is intended by design or if its an actual bug.
The fact that this happens only after the flowfiles exit the process group lead me to believe that whatever you are doing there it's indirectly causing this error.
Explanation:
- Typically when you get such an error , it means that you are trying pass number of value arguments to a query that doesnt have a matching number of column placeholders. You query doesnt list columns placeholders (?) hence the value 0 . The question is where is the 1 argument is coming from?
- If you read the PutSQL processor description it allows for a value argument placeholder using the question mark character (?) provided argument value\type tuple in the form of sql.args.N.type & sql.args.N.value as flowfile attributes, where N stands for the position of each argument placeholder in the sql command.
- Given your data flow, it seems like you are doing some insert sql commands in the upstream process group while utilizing argument value placeholders.
- When doing the Merge (regardless how you set the Attribute Strategy) the merged output seem to keep those sql argument type & value attributes that are common (name & value) across the merged flowfiles.
- The PutSQL behavior (this is where its a potential bug) seem to always pass any provided sql argument regardless if the actual query has placeholders or not. When the JDBC tries to execute the query it sees you are providing arguments but there is no placeholders and therefore it throws this exception.
Mitigation:
1- One way to mitigate this issue is by adding an UpdateAttribute before the final PutSQL to delete all sql placeholder arguments with the following Regix in the "Delete Attribute Expression" property:
sql\.args\.[0-9]+\.(type|value)
2- Another option to avoid passing attributes that might indirectly cause unintended consequences to downstream processors through using Process Group Concurrency and merge, as an alternative you can consider something like Wait-Notify pattern, however this will require a redesign of the dataflow.
@MattWho @stevenmatison , Do you think this qualifies to be reported as a bug?
Steps to Reproduce:
1- Create starting processor like GetFile or GenerateFlowFile
2- Add UpdateAttribute processor to create the following attributes:
3- Add a PutSQL processor that executes any arbitrary crud statement that doesnt have any placeholders.
Resolution:
Check the sql command for any placeholder and if there is none then dont pass arguments.
If that helps please accept the solution.
Thanks
Created 10-25-2024 03:17 PM
@rturn Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @SAMSAL @MattWho who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 10-25-2024 07:54 PM
Do you mind posting screenshot of your data flow and the configuration of the main processors? It doesnt need to be the exact thing if you can simplify so that you are able to replicate the error that would be fine. This will help us better troubleshoot the issue and replicate it to our environment if needed to see what its going on.
Created 10-28-2024 02:36 PM
Here is the overall data flow. The second and third processors aren't relevant to the issue, but I'll just leave them there.
The group to the upper right, highlighted in the yellow box, is just something I added to help debug the issue. The 'PutSQL' is copy/pasted from the one at the bottom of the screen. When the whole process ran, the one at the bottom, would fail. But I could then start the cloned one, that had not traversed through the 'Data Uploader' process group, and would work successfully.
The PutSQL's configuration:
And the 'Data Upload' Process Group settings:
Let me know if you need more.
Thanks.
Created 10-29-2024 03:05 AM
Hi @rturn ,
Its an interesting case indeed. I'm not sure if this is intended by design or if its an actual bug.
The fact that this happens only after the flowfiles exit the process group lead me to believe that whatever you are doing there it's indirectly causing this error.
Explanation:
- Typically when you get such an error , it means that you are trying pass number of value arguments to a query that doesnt have a matching number of column placeholders. You query doesnt list columns placeholders (?) hence the value 0 . The question is where is the 1 argument is coming from?
- If you read the PutSQL processor description it allows for a value argument placeholder using the question mark character (?) provided argument value\type tuple in the form of sql.args.N.type & sql.args.N.value as flowfile attributes, where N stands for the position of each argument placeholder in the sql command.
- Given your data flow, it seems like you are doing some insert sql commands in the upstream process group while utilizing argument value placeholders.
- When doing the Merge (regardless how you set the Attribute Strategy) the merged output seem to keep those sql argument type & value attributes that are common (name & value) across the merged flowfiles.
- The PutSQL behavior (this is where its a potential bug) seem to always pass any provided sql argument regardless if the actual query has placeholders or not. When the JDBC tries to execute the query it sees you are providing arguments but there is no placeholders and therefore it throws this exception.
Mitigation:
1- One way to mitigate this issue is by adding an UpdateAttribute before the final PutSQL to delete all sql placeholder arguments with the following Regix in the "Delete Attribute Expression" property:
sql\.args\.[0-9]+\.(type|value)
2- Another option to avoid passing attributes that might indirectly cause unintended consequences to downstream processors through using Process Group Concurrency and merge, as an alternative you can consider something like Wait-Notify pattern, however this will require a redesign of the dataflow.
@MattWho @stevenmatison , Do you think this qualifies to be reported as a bug?
Steps to Reproduce:
1- Create starting processor like GetFile or GenerateFlowFile
2- Add UpdateAttribute processor to create the following attributes:
3- Add a PutSQL processor that executes any arbitrary crud statement that doesnt have any placeholders.
Resolution:
Check the sql command for any placeholder and if there is none then dont pass arguments.
If that helps please accept the solution.
Thanks
Created 10-30-2024 08:10 AM
For reference, I tried solution #1 where I added an UpdateAttribute processor and deleted any references to SQL attributes. That did it.
Thanks!