Support Questions
Find answers, ask questions, and share your expertise

How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Explorer

I want to setup ODS (operational data store) in Hive to sync data from our MySQL DB. Noticed that Apache Nifi can help setup a visualize data pipeline. So how to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes with BinLog to Apache Hive / HDFS, and queried by Hive? Should I need use hive streaming? Thanks!

4 REPLIES 4

Re: How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Super Guru

Use the CDC Processor for MySQL and then use the ConvertAVROtoORC then PutHDFS. that is a nice option. search the articles here and you'll see a few examples of this way and also using hive streaming.

Re: How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Explorer

I found examples like "Change Data Capture (CDC) with Apache NiFi" but it doesn't provide a generic way. The "JsonPathReader" controller service has to parse data table by table. I am seeking a generic way to leverage schema registry to parse data. Btw, how to use hive streaming with PutHDFS?

Re: How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Explorer

Can anyone provide sample apache nifi template? Thanks!

Re: How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Explorer

And how to monitor the performance of apache nifi?