Support Questions

Find answers, ask questions, and share your expertise

Ingest modifiable data from SQL to HDFS using NiFi

avatar
New Contributor

I trying to ingest a SQL Table to Hive using NiFi. This is easily done by using a QueryRecordTable. However, the records in the SQL table can be modified by hand, so they are not static.

Is there any way to update the data in HDFS according to the changes made in the SQL table?

4 REPLIES 4

avatar
Master Collaborator

 CaptureChangeMySQL processor would be fit for your requirement.

 

If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.

Thank you

 

 

 

avatar
New Contributor

I'm sorry, i forgot to specify that I'm working with MS SQL Server, not with MySQL. Is the CaptureChangeMySQL processor valid for MS SQL Server?

avatar
Master Collaborator

No, So you can evaluate  GenerateTableFetch -->ExecuteSQL 

Where GenerateTableFetch "Maximum-value Columns" setting can help.  refr

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...

 

If you found this response assisted with your issue, please take a moment and click on "Accept as Solution" below this post.

Thank you

avatar
New Contributor

I'm afraid that the GenerateTableFetch does not help me either because it uses the highest value on a column to determine whether to read new added records in the SQL server.

 

In my case, I'm trying to extract the already existing records that have recently been modified. If there is no processor than can directly deal with this, i may add a column to the table view which displays the time when the record was last modified (i.e. LastModificationDateTime) and use this column to retain the "Maximum-value Columns" value.

 

Back to the second part of my doubt, regarding the updating of this record in HDFS, how can I deal with this? I have the following approach in mind:

1 - Fetch and Read the parquet file where the "ChangedID" was stored.

2 - Modify the record in that file.

3 - Save up or replace that file in HDFS

 

Is this approach correct? Is there any better solution?