I want to setup ODS (operational data store) in Hive to sync data from our MySQL DB. Noticed that Apache Nifi can help setup a visualize data pipeline. So how to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes with BinLog to Apache Hive / HDFS, and queried by Hive? Should I need use hive streaming? Thanks!
Use the CDC Processor for MySQL and then use the ConvertAVROtoORC then PutHDFS. that is a nice option. search the articles here and you'll see a few examples of this way and also using hive streaming.
I found examples like "Change Data Capture (CDC) with Apache NiFi" but it doesn't provide a generic way. The "JsonPathReader" controller service has to parse data table by table. I am seeking a generic way to leverage schema registry to parse data. Btw, how to use hive streaming with PutHDFS?
Can anyone provide sample apache nifi template? Thanks!
And how to monitor the performance of apache nifi?