Support Questions

Find answers, ask questions, and share your expertise

can you please explain the apache kafka lifecycle with sources and sinks?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

View solution in original post

3 REPLIES 3

avatar

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

avatar
Contributor

Tq @Lester Martin ..!! is Apache kafka or Apache Flume supports SQL Server As A Source?

avatar

As I don't know the answer to "what do you want to do", I invite you to take a peek at the responses to https://community.hortonworks.com/questions/12787/how-to-integrate-kafka-to-pull-data-from-rdbms.htm... as it is along the same line of thinking (I believe).

Technically, Kafka does have a Connector API, http://kafka.apache.org/documentation.html#connect, which could theoretically could do what you are asking, but I do not know anyone who has done exactly that with Kafka (mostly folks doing more traditional pub/sub clients). As for "in practice", I did a quick Google search for "kafka connect sql server" and found two non open-source solutions that work with Kafka Connect to do what you said, but it doesn't look like there is a completely open-source solution available at the moment.

On the Flume front, I think there is only a JDBC Channel, not a source or sink (at least not in 1.5.2 which ships with HDP 2.5). I'm thinking NiFi (aka HDF) and/or Sqoop might be better tools for retrieving data from a RDBMS like SQL Server.