Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

can you please explain the apache kafka lifecycle with sources and sinks?

Explorer
 
1 ACCEPTED SOLUTION

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

View solution in original post

3 REPLIES 3

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

Explorer

Tq @Lester Martin ..!! is Apache kafka or Apache Flume supports SQL Server As A Source?

As I don't know the answer to "what do you want to do", I invite you to take a peek at the responses to https://community.hortonworks.com/questions/12787/how-to-integrate-kafka-to-pull-data-from-rdbms.htm... as it is along the same line of thinking (I believe).

Technically, Kafka does have a Connector API, http://kafka.apache.org/documentation.html#connect, which could theoretically could do what you are asking, but I do not know anyone who has done exactly that with Kafka (mostly folks doing more traditional pub/sub clients). As for "in practice", I did a quick Google search for "kafka connect sql server" and found two non open-source solutions that work with Kafka Connect to do what you said, but it doesn't look like there is a completely open-source solution available at the moment.

On the Flume front, I think there is only a JDBC Channel, not a source or sink (at least not in 1.5.2 which ships with HDP 2.5). I'm thinking NiFi (aka HDF) and/or Sqoop might be better tools for retrieving data from a RDBMS like SQL Server.