Support Questions

iso8583 · ‎06-19-2023

Hi All

@mattclarke,@mattburgess,@markpayne.

We are doing Postgres to Postgres data migration using NIFI. We are fetching the data from PostgresDb and putting it into PostgresDb,but while transferring records from source to destination, we are getting duplicate key errors. Only half of the rows are transferred (for example, if we have 300 million records, it is transfered only 150 million, after that, it is giving a duplicate key error). After researching about this issue, we found that this is because the sequence is out of sync, so as per the suggestion of the online community, we put the column name in the Maximum-value Columns property in the GenerateTableFetch processor (it automatically orders it in ascending order and helps in incremental fetching of records) After this, we are not getting the duplicate key error issue; nifi is successfully transferred 300 million records,but in production, we have 2k tables. How do we achieve this on all the tables? Also, we need more throughput in terms of the size of the data transfer (at least 1GB per minute). Please suggest good practises for achieving this problem at the organisational level.

this is the flow we implemented :-https://drive.google.com/file/d/1kGYe-H3Qpd5z3LBp7N31Zc1P7REDuoqa/view?usp=drivesdk

Thanks in advance.

VidyaSargur · ‎06-19-2023

@iso8583, Welcome to our community! To help you get the best possible answer, I have tagged in our NiFi experts @MattWho @mburgess @SAMSAL who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

cotopaul · ‎06-20-2023

@cotopaul, tagging myself because I am struggling with a similar issue and was not quite able to figure it out myself ... and maybe I will get some hints from some of the answers.

Cloudera Community

Support Questions

PostgresToPostgres Data migration

Ingesting 3rd Party Data into the Cloudera Data La...

JupyterLab and Spark Connect Quickstart in Clouder...

Sql to Cdp Data Migration

Enterprise Data Quality at Scale with Spark and Gr...

Cloudera Data Engineering Spark Job with Python Wh...

How to Simplify Spark-Submit JAR Dependency Manage...

Metadata in Cloudera data warehouses

Datagen - Data Generator tool built for CDP

Open Data Lakehouse powered by Apache Iceberg on A...

Working with CDE Spark Job Parameters in Cloudera ...