Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

can you please explain the apache kafka lifecycle with sources and sinks?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

View solution in original post

3 REPLIES 3

avatar

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

avatar
Contributor

Tq @Lester Martin ..!! is Apache kafka or Apache Flume supports SQL Server As A Source?

avatar

As I don't know the answer to "what do you want to do", I invite you to take a peek at the responses to https://community.hortonworks.com/questions/12787/how-to-integrate-kafka-to-pull-data-from-rdbms.htm... as it is along the same line of thinking (I believe).

Technically, Kafka does have a Connector API, http://kafka.apache.org/documentation.html#connect, which could theoretically could do what you are asking, but I do not know anyone who has done exactly that with Kafka (mostly folks doing more traditional pub/sub clients). As for "in practice", I did a quick Google search for "kafka connect sql server" and found two non open-source solutions that work with Kafka Connect to do what you said, but it doesn't look like there is a completely open-source solution available at the moment.

On the Flume front, I think there is only a JDBC Channel, not a source or sink (at least not in 1.5.2 which ships with HDP 2.5). I'm thinking NiFi (aka HDF) and/or Sqoop might be better tools for retrieving data from a RDBMS like SQL Server.