Support Questions

Find answers, ask questions, and share your expertise

How to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes (CDC) to Hive?

Explorer

I want to setup ODS (operational data store) in Hive to sync data from our MySQL DB. Noticed that Apache Nifi can help setup a visualize data pipeline. So how to use Apache Nifi to setup a generic pipeline to streaming realtime MySQL changes with BinLog to Apache Hive / HDFS, and queried by Hive? Should I need use hive streaming? Thanks!

4 REPLIES 4

Super Guru

Use the CDC Processor for MySQL and then use the ConvertAVROtoORC then PutHDFS. that is a nice option. search the articles here and you'll see a few examples of this way and also using hive streaming.

Explorer

I found examples like "Change Data Capture (CDC) with Apache NiFi" but it doesn't provide a generic way. The "JsonPathReader" controller service has to parse data table by table. I am seeking a generic way to leverage schema registry to parse data. Btw, how to use hive streaming with PutHDFS?

Explorer

Can anyone provide sample apache nifi template? Thanks!

Explorer

And how to monitor the performance of apache nifi?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.