Created 09-20-2017 08:09 AM
I want to setup ODS (operational data store) in Hive to sync data from our MySQL DB. Noticed that Apache Nifi can help setup a visualize data pipeline. So how to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes with BinLog to Apache Hive / HDFS, and queried by Hive? Should I need use hive streaming? Thanks!
Created 09-20-2017 09:38 AM
Use the CDC Processor for MySQL and then use the ConvertAVROtoORC then PutHDFS. that is a nice option. search the articles here and you'll see a few examples of this way and also using hive streaming.
Created 09-20-2017 09:51 AM
I found examples like "Change Data Capture (CDC) with Apache NiFi" but it doesn't provide a generic way. The "JsonPathReader" controller service has to parse data table by table. I am seeking a generic way to leverage schema registry to parse data. Btw, how to use hive streaming with PutHDFS?
Created 10-09-2017 02:03 PM
Can anyone provide sample apache nifi template? Thanks!
Created 10-09-2017 02:04 PM
And how to monitor the performance of apache nifi?