I want to setup ODS (operational data store) in Hive to sync data from our MySQL DB. Noticed that Apache Nifi can help setup a visualize data pipeline. So how to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes with BinLog to Apache Hive / HDFS, and queried by Hive? Should I need use hive streaming? Thanks!
I found examples like "Change Data Capture (CDC) with Apache NiFi" but it doesn't provide a generic way. The "JsonPathReader" controller service has to parse data table by table. I am seeking a generic way to leverage schema registry to parse data. Btw, how to use hive streaming with PutHDFS?