- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Ingest modifiable data from SQL to HDFS using NiFi
- Labels:
-
Apache Hive
-
Apache NiFi
Created ‎03-29-2023 12:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I trying to ingest a SQL Table to Hive using NiFi. This is easily done by using a QueryRecordTable. However, the records in the SQL table can be modified by hand, so they are not static.
Is there any way to update the data in HDFS according to the changes made in the SQL table?
Created ‎03-29-2023 03:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CaptureChangeMySQL processor would be fit for your requirement.
If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.
Thank you
Created on ‎03-29-2023 03:08 AM - edited ‎03-29-2023 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry, i forgot to specify that I'm working with MS SQL Server, not with MySQL. Is the CaptureChangeMySQL processor valid for MS SQL Server?
Created ‎03-29-2023 05:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, So you can evaluate GenerateTableFetch -->ExecuteSQL
Where GenerateTableFetch "Maximum-value Columns" setting can help. refr
If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.
Thank you
Created on ‎03-29-2023 06:05 AM - edited ‎03-29-2023 06:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm afraid that the GenerateTableFetch does not help me either because it uses the highest value on a column to determine whether to read new added records in the SQL server.
In my case, I'm trying to extract the already existing records that have recently been modified. If there is no processor than can directly deal with this, i may add a column to the table view which displays the time when the record was last modified (i.e. LastModificationDateTime) and use this column to retain the "Maximum-value Columns" value.
Back to the second part of my doubt, regarding the updating of this record in HDFS, how can I deal with this? I have the following approach in mind:
1 - Fetch and Read the parquet file where the "ChangedID" was stored.
2 - Modify the record in that file.
3 - Save up or replace that file in HDFS
Is this approach correct? Is there any better solution?
