Support Questions

Find answers, ask questions, and share your expertise

ExecuteSQL does not let go of threads

avatar
Explorer

Hi,

 

I am using the ExecuteSQL processor in NiFi to write records to PostgreSQL.

The problem I have is when running on multi-thread (for example 10), the processor gets stuck and runs too slow. This doesn't happen in a single thread. However, single thread is not efficient so doesn't work for me.

After getting slow, I stop the processor, terminate it, and start it again. For almost 5-10 seconds everything is ok and then the same problem.

I also have a similar problem with the InvokeHTTP processor.

 

Thank you in advance.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@garoosy 

You should look in to using the "ExecuteSQLRecord" instead of "ExecuteSQL" for large volume data.  To be efficient here you  would have many records in a single FlowFile.

Right now you have a single record per each FlowFile which is not going to be very efficient.  The only way for "ExecuteSQL" to handle multiple FlowFile executions in a single connection is if the SQL statement used in every FlowFile is identical.  In order to do that the unique values would need to come from FlowFile attributes.

You may find these post helpful:
https://community.cloudera.com/t5/Support-Questions/Nifi-ExectueSQL-how-to-force-a-parameter-to-be-a...

https://stackoverflow.com/questions/63330790/using-nifi-executesqlrecord-with-parameterized-sql-stat...

 

If you have threads that never seem to complete (will see small number in upper right corner of processor (2)), it is best to get a series of thread dumps (4 - 6) to verify thread is not progressing.  Then you have to determine if what the thread is waiting on.  Did you try setting a "Max Wait Time" on the processor?  It defaults to 0 which means it would wait forever.

 

Hope this helps,

Matt

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@garoosy 

You should look in to using the "ExecuteSQLRecord" instead of "ExecuteSQL" for large volume data.  To be efficient here you  would have many records in a single FlowFile.

Right now you have a single record per each FlowFile which is not going to be very efficient.  The only way for "ExecuteSQL" to handle multiple FlowFile executions in a single connection is if the SQL statement used in every FlowFile is identical.  In order to do that the unique values would need to come from FlowFile attributes.

You may find these post helpful:
https://community.cloudera.com/t5/Support-Questions/Nifi-ExectueSQL-how-to-force-a-parameter-to-be-a...

https://stackoverflow.com/questions/63330790/using-nifi-executesqlrecord-with-parameterized-sql-stat...

 

If you have threads that never seem to complete (will see small number in upper right corner of processor (2)), it is best to get a series of thread dumps (4 - 6) to verify thread is not progressing.  Then you have to determine if what the thread is waiting on.  Did you try setting a "Max Wait Time" on the processor?  It defaults to 0 which means it would wait forever.

 

Hope this helps,

Matt

avatar
Master Guru

You generally don't need to run multiple threads for that.

 

Also make sure you size your nodes and cluster appropriately.

 

https://www.datainmotion.dev/2020/07/sizing-your-apache-nifi-cluster-for.html

avatar
Explorer

Thanks for your reply.

I will try it.

avatar
Explorer

Thank you Matt!

Altering the "Max Wait Time" value was a game-changing move. I still need to improve it. But the thread problem is fixed now.