I have a 3 node cluster of Nifi instances running - backed by Zookeeper. I am attempting to run an ETL pipeline that copies data to a staging table from S3 into Redshift and performs Redshift's version of an UPSERT (DELETE the overlap, insert from staging).
My issue is that whenever the query is sent from the master node, it fails. The failure is reported in Nifi, but when I look in the Redshift "queries" pane, I can still see that the queries got executed. The failure message in the Nifi bulletin is:
"Failed to update database due to a failed batch update. There were a total of 1 FlowFiles that failed, 0 that succeeded, and 0 that were not execute and will be routed to retry;"
The security groups allow all three nodes to execute queries (all nodes are executing different queries that are successful). Upon failure I have grabbed the query being executed and ran it in Datagrip and everything executes normally. Is there possibly some sort of driver/connection cache somewhere that I could clear to try and get the master node back to a functioning state?
Oh and by the way, if I switch the PutSQL node to execute on master only, the queue just backs up and the processor never picks up any of the messages. Weird, huh?! Below, 10.50.10.83 is the master.
There should be more information about the failure in the logs, directly after the same "Failed to update database due to..." message, can you provide the extra error information? Also, what version of NiFi are you using?
Have you performed a listing on queue feeding your PutSQL processor and looked the Position 1 FlowFile it is trying to process on your failing node?
Does the content of that FlowFile look to be correctly formed?