Created 06-07-2018 07:26 PM
Hi,
I'm unable to establish connection between Mysql and Hive due to DBCP connection pool .This is my configuration settings in DBCP connection pool ( please refer attached) . I'm getting error says (refer attached ) dbcp1.jpg dbcp2.jpg
Query database table -> put hive streaming processors is used . and Query Database table is scheduled to run every three minutes .
I have restarted Nifi by disabling the DBCP connection pool and tried . Still the same and failed .
Also Is it possible to insert updated values + New values in Mysql to Hive ( which processor should I use)
@Matt Clarke @Matt Burgess . Could you please have a look here please . what am I doing wrong .
Thanks All
Created 06-07-2018 10:07 PM
Try without file:// in your Database Driver Location(s) value i.e
/tmp/mysqljar/mysql-connector-java-5.1.46.jar
Check permissions and driver is on all the NiFi cluster nodes.
-
Is it possible to insert updated values + New values in Mysql to Hive?
Yes, it's possible only if you are having some sort of identifier field that you could tell it's an updated record in the source table.
Usually when record gets updated/created in the RDBMS tables, add current timestamp to the record to the record then in NiFi using QueryDatabaseTable processor Maximum-value Columns property with the timestamp field then the processor will store the state and pulls only the incremental records(updated records+ New records).
Created 06-07-2018 10:07 PM
Try without file:// in your Database Driver Location(s) value i.e
/tmp/mysqljar/mysql-connector-java-5.1.46.jar
Check permissions and driver is on all the NiFi cluster nodes.
-
Is it possible to insert updated values + New values in Mysql to Hive?
Yes, it's possible only if you are having some sort of identifier field that you could tell it's an updated record in the source table.
Usually when record gets updated/created in the RDBMS tables, add current timestamp to the record to the record then in NiFi using QueryDatabaseTable processor Maximum-value Columns property with the timestamp field then the processor will store the state and pulls only the incremental records(updated records+ New records).
Created on 06-08-2018 08:45 PM - edited 08-17-2019 08:01 PM
We cannot load the sample table that you have shared incrementally because
let's take you have used QueryDatabaseTable and given max value column as id then processor will pulls(id's 1,2,3) in the first run and stores the state as 3 after the first run.
Second run processor pulls only the records that are greater than id>3 now we are not able to pull 1,3 records as they are updated ,Processor will pull only id=4 and updates the state to 4.
1.How can we pull data incrementally from RDBMS table then?
Let's consider your table having one more extra column i.e update_at(timestamp type) this field will be updated with system timestamp when there is new records inserted/created in the table (or) records got updated in the table.
Initially you are having 3 records that are created at 2017-01-01 12:00:00.00 and then id's 1,3 updated and 4 created with new updated_at timestamp i.e 2018-06-08 12:00:00.00.
Now you have configured QueryDatabaseTable processor with max value column as updated_at so
By using this table structure we are going to capture all the updates that are happening on the source table to Hive table.
2.On the second load , We have to remove the values 3 and update the value 1 and insert the value 4 . How that is possible in Nifi . Is that at possible ?
Update/Deletes in Hive through NiFi is not yet possible(hive natively don't support update/deletes) every thing will be appends to the hive table.
Approach1:
Look into Hive Acid Merge strategy described in this link will best fit for your needs.
In this article describes merge strategy
To use this merge strategy your source table needs to be designed in a way that you can capture all the updates that are happening in the table.
The final table in hive needs to be Acid enabled table to support merge functionality and Make sure your hive version supports this merge strategy.
Approach2:
-
Click on Accept button below, if you feel like the answer addressed your question..!!
Created 06-08-2018 06:58 PM
Hi
I was able to establish a connection between mysql and Hive . However the purpose is not solved yet . I was trying to update and insert values into Hive . For Example ( refer attached) .
In the example : . On Initial load Hive has loaded the values of ID 1,2 and 3 . On the second load , We have to remove the values 3 and update the value 1 and insert the value 4 . How that is possible in Nifi . Is that at possible ?
Could please guide me on this If you can.
thanks ...
Created 06-09-2018 03:24 AM
Fantastic and detailed reply. I would try this out and reply if that works .Thanks a lot @Shu