Hello,
I am in the process of transferring data from an Oracle database to HDFS, formatted as Parquet. The workflow, as depicted in the attached screenshot, employs the GeneratedTableFetch method for data ingestion in segments. Additionally, the ExecuteSQL processor runs the generated queries, UpdateAttribute is utilized to add an attribute, and QueryRecord is used to introduce a new column in the flow files.
The source table contains approximately 22 million records. In this procedure, the 'Date' column is set as the 'Maximum-value Column' with the GenerateTableFetch processor, and a partition size of one million rows has been configured. This approach has allowed for the successful transfer and storage of all 22 million records into HDFS.
However, during data quality checks, I encountered some issues: there were missing rows and instances of duplicate records in the HDFS storage, a situation not mirrored in the source database, which maintained an accurate and duplicate-free record count.
Could you assist in pinpointing the potential reasons for these discrepancies?

