Support Questions

Find answers, ask questions, and share your expertise

putDatabaseRecord really slow

avatar
Explorer

Hi there,

I am experiencing really slow ingestion with putDatabaseRecord process.
The flow is simple - query data from mySQL 5.6 and ingesting it into redshift.
Currently im just testing it so I list only specific tables from those 2 databases and then I query 'select * from $db.tablename' - but the ingestion using putDatabaseRecord is terribly slow (100 rows 5min).
I am using AvroReader inside putDatabaseRecord but didnt make any changes to the avroreader - I just make sure flow files be small (100 rows).
Does AvroReader split the 100 rows into 100 SQL statements?
How do I fix this to have better performance?

LiranJP_0-1709733671035.png

LiranJP_1-1709734035670.pngLiranJP_2-1709734068649.pngLiranJP_3-1709734081736.png

LiranJP_4-1709734109009.png

 

4 REPLIES 4

avatar
Community Manager

@LiranJP Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @mburgess @MattWho  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Guru

How is your DBCPConnectionPool configured?

avatar
Explorer

That's the redshift connection:
LiranJP_0-1709792776559.pngLiranJP_1-1709792792457.png

That's the mySQL connection:

LiranJP_2-1709792941484.pngLiranJP_3-1709792955258.png

I do understand that INSERTs on redshift might be slow and that's why people are using COPY command but 100 rows in 5min is weird.
I know that for BigQuery for example there's a dedicated processor in order to load data but couldn't find anything for RedShift so that's why I am using putDatabaseRecord.

Edit - After I loaded 100 rows I exported the same 100 rows into an INSERT SQL statement and run it directly on the redshift, this task took 1.1 second.
Edit2 - Since im the kind of guy that loves doing my job and to find solution I found this -
reWriteBatchedInserts=true;reWriteBatchedInsertsSize=102400;
I found this on some reddit post by chance, this is working way better now but I couldn't find some details regard those parameters to make them optimized and still that's not the best (still little bit slow).

I am not sure that's the solution so i'm still waiting for your input as well.
Also, if I would like to change it to drop the Avro/flow files into S3 and then execute COPY command from NIFI is it possible?

avatar
Explorer

@mburgess Any update from your side?