I'm already having a MySQL table in my local machine (Linux) itself, and I have a Hive external table with the same schema as the MySQL table. I want to sync my hive external table whenever new record is inserted or updated.Batch update is ok with me say hourly. What is the best possible approach to achieve the same without using sqoop?
This can be very easily achivable using NiFi. Check queryDatabase processor or ExecuteSQL
If I use queryDatabase processor or ExecuteSQL in Nifi it will create the multiple files in case of update transaction.
I want to merge the data as well in the target hive table.How to achieve that?
Hi @Sumit Deshmukh!
Guess you have some approaches to retrieve Mysql Data, like:
- Use CDC to get data from Mysql without being invasive (gathering data from BINLOG). Then you can use tools like Nifi (recommended) or other cdc tools like canal from Alibaba. Take a look at the link below:
- Or use JDBC like Kafka Connect, then throw the data directly into a kafka.
Hope this helps! :)