Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

can you please explain the apache kafka lifecycle with sources and sinks?

Solved Go to solution

can you please explain the apache kafka lifecycle with sources and sinks?

Explorer
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: can you please explain the apache kafka lifecycle with sources and sinks?

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

3 REPLIES 3

Re: can you please explain the apache kafka lifecycle with sources and sinks?

Just to make sure we are in-step on nomenclature. "Sources" and "Sinks" are http://flume.apache.org terminology as http://kafka.apache.org is all about Publishers and Subscribers that interact through Topics (aka message queues) that are persisted in a Kafka Cluster. If that makes sense and you just want to understand the interactions between Kafka publishers & subscribers then check out http://kafka.apache.org/intro for some introductory material.

On the Flume front, it seems in 1.6.0 Kafka Source & Sink options became available as seen in the current (1.7.0) user guide at https://flume.apache.org/FlumeUserGuide.html. As a point of reference HDP 2.5 includes Flume 1.5.2 as detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.ht..., so that is not yet available via HDP.

Re: can you please explain the apache kafka lifecycle with sources and sinks?

Explorer

Tq @Lester Martin ..!! is Apache kafka or Apache Flume supports SQL Server As A Source?

Re: can you please explain the apache kafka lifecycle with sources and sinks?

As I don't know the answer to "what do you want to do", I invite you to take a peek at the responses to https://community.hortonworks.com/questions/12787/how-to-integrate-kafka-to-pull-data-from-rdbms.htm... as it is along the same line of thinking (I believe).

Technically, Kafka does have a Connector API, http://kafka.apache.org/documentation.html#connect, which could theoretically could do what you are asking, but I do not know anyone who has done exactly that with Kafka (mostly folks doing more traditional pub/sub clients). As for "in practice", I did a quick Google search for "kafka connect sql server" and found two non open-source solutions that work with Kafka Connect to do what you said, but it doesn't look like there is a completely open-source solution available at the moment.

On the Flume front, I think there is only a JDBC Channel, not a source or sink (at least not in 1.5.2 which ships with HDP 2.5). I'm thinking NiFi (aka HDF) and/or Sqoop might be better tools for retrieving data from a RDBMS like SQL Server.

Don't have an account?
Coming from Hortonworks? Activate your account here