Created 03-29-2023 12:51 AM
I trying to ingest a SQL Table to Hive using NiFi. This is easily done by using a QueryRecordTable. However, the records in the SQL table can be modified by hand, so they are not static.
Is there any way to update the data in HDFS according to the changes made in the SQL table?
Created 03-29-2023 03:00 AM
CaptureChangeMySQL processor would be fit for your requirement.
If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.
Thank you
Created on 03-29-2023 03:08 AM - edited 03-29-2023 03:10 AM
I'm sorry, i forgot to specify that I'm working with MS SQL Server, not with MySQL. Is the CaptureChangeMySQL processor valid for MS SQL Server?
Created 03-29-2023 05:47 AM
No, So you can evaluate GenerateTableFetch -->ExecuteSQL
Where GenerateTableFetch "Maximum-value Columns" setting can help. refr
If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.
Thank you
Created on 03-29-2023 06:05 AM - edited 03-29-2023 06:05 AM
I'm afraid that the GenerateTableFetch does not help me either because it uses the highest value on a column to determine whether to read new added records in the SQL server.
In my case, I'm trying to extract the already existing records that have recently been modified. If there is no processor than can directly deal with this, i may add a column to the table view which displays the time when the record was last modified (i.e. LastModificationDateTime) and use this column to retain the "Maximum-value Columns" value.
Back to the second part of my doubt, regarding the updating of this record in HDFS, how can I deal with this? I have the following approach in mind:
1 - Fetch and Read the parquet file where the "ChangedID" was stored.
2 - Modify the record in that file.
3 - Save up or replace that file in HDFS
Is this approach correct? Is there any better solution?