Created 07-07-2022 08:52 PM
Hi,
I have to work on nifi data flow to migrate data from one database to another database from multiple schemas. I have to migrate only specific set of tables with a filtering applied to each of the tables. I came up with a sample flow where i used one ExecuteSql Processor for each table that has to be migrated. In the Sql query for the processor i applied the filter criteria and then connected that processor to an individual PutDatabaseRecord processor for each destination table.
Just want to understand is that the right approach to have separate processors for each source and destination table? Also would appreciate some guidance on what are the best processors to be considered for data migration from one db to another? This will be a continuous migration to be used in Production environment that will keep running for a couple of months.
Appreciate any help.
Thanks
Created on 07-09-2022 05:24 AM - edited 07-09-2022 05:30 AM
It depends on your use case - how many tables, how much records each table has, one time migration vs cdc etc.
For one time migration (for till date) , if data is huge then its better to run sql paginated query instead of full table select so that resulting flow file will not be very huge and hence no impact on nifi performance.
If you have multiple tables then may be you can add a generate flow file processor for triggering the flow with all the tables names in json and split the json and send each table name to ExecuteSQLQuery processor to execute and get data..
If CDC need migrated then may be you can schedule the flow very frequently and query the data based on datetime range if possible or any incremental values (which can be cached using nifi cache for new run)
Created 07-10-2022 04:15 AM
@hegdemahendra Thank you for your reply.. It will be a continuous migration that will run for few months... Once all the historic data is migrated, it will be a CDC. The no of tables could be close to 20-30 tables.. I dont know yet about the volume but it will be huge... Based on your response, the solution looks like the combined approach for multiple tables and CDC. Could you please share little more details about this approach?