I have a huge NiFi process group with 50+ parallel flows. The flow automatically keeps picking CSVs from S3 bucket and inserting it into the database. I have TimeOfDay and CSV_ID as Composite primary key. The TimeOfDay can repeat across CSV files but CSV ID and ToD pair is unique. However, the final processor i.e. PutSQL is throwing Primary Key constraint error. I don't know the problem for this. It could be that the processor is looking at ToD and CSV ID individually. I am looking for a way to fix this problem since the architecture that I want is in place but the primary key constraint is causing a major blockage.
I need to play with the S3 processors a bit to be more helpful, but wondering if there is any issue getting these files in a NiFi cluster and if you should be marking the pull processor to be an "isolated processor" to run only on the "primary node" as Brian describes in his answer to https://community.hortonworks.com/questions/99328/nifi-isolated-processors-in-a-clustered.html. Worth giving it a try first to see if that's part of the problem.
Thank you for your response. I see your point. I am going to try using an isolated processor but the problem seems to be with MySql instance. It stores the table schema for table 'xyz'. This means that if table 'xyz' has an assigned primary key, say 'p1', and I drop this table and recreate it with the same name and try to name a composite primary key 'p1,k1', and then run my nifi flow with updated avro, the PutSql processor will continue to throw errors because the original schema for table 'xyz' has been stored in the registry. It seems that the only solutions, and the two which have worked for me, are to either rename the table such as call it xyz1 or to create the same table in a new database. Of course, i can flush the schema registry and logs but that will delete other tables also which I do not want to do. As per my understanding, the problem lies with MySql instance and not Nifi. Do you think otherwise?