Member since
03-26-2024
2
Posts
1
Kudos Received
0
Solutions
03-26-2024
02:41 PM
the 'date' column in our table is indeed of the TIMESTAMP data type.
... View more
03-26-2024
03:40 AM
1 Kudo
Hello, I am in the process of transferring data from an Oracle database to HDFS, formatted as Parquet. The workflow, as depicted in the attached screenshot, employs the GeneratedTableFetch method for data ingestion in segments. Additionally, the ExecuteSQL processor runs the generated queries, UpdateAttribute is utilized to add an attribute, and QueryRecord is used to introduce a new column in the flow files. The source table contains approximately 22 million records. In this procedure, the 'Date' column is set as the 'Maximum-value Column' with the GenerateTableFetch processor, and a partition size of one million rows has been configured. This approach has allowed for the successful transfer and storage of all 22 million records into HDFS. However, during data quality checks, I encountered some issues: there were missing rows and instances of duplicate records in the HDFS storage, a situation not mirrored in the source database, which maintained an accurate and duplicate-free record count. Could you assist in pinpointing the potential reasons for these discrepancies?
... View more
Labels:
- Labels:
-
Apache NiFi