Created 11-13-2017 11:33 PM
I'm looking for ways to get data from Kafka to Python.
Currently I'm using this pipeline. Has anyone faced issues with using Flume?
Flume(exec-source and Kafka-sink) --> Kafka --> Flume(kafka-source and HDFS-sink)
Other options: In case I have a kafka-consumer written, is there a python way of getting the data from Kafka consumer to HDFS (other than Confluent's Connect API)?
Or are there any other means I can get the data from Kafka t HDFS?
Created 11-14-2017 05:29 AM
Hi Swaapnika, I've tried using Flume for that and had no issues.
Investigate this repository for python https://github.com/edenhill/librdkafka. This is the most exhaustive one I guess.
Created 11-14-2017 05:44 PM
I see Flume is deprecated and will be removed from HDP in the future releases as mentioned in the HDP-2.6.2-Release Notes. Are there any other techniques that could be used with kafka to get data into HDFS?
Created 11-14-2017 07:31 PM
@Swaapnika Guntaka You could use Spark Streaming in PySpark to consume a topic and write the data to HDFS.
You could also use HDF with NiFi and skip Python entirely.
Also, this is a Python client, by Confluent, not related to Kafka Connect. https://github.com/confluentinc/confluent-kafka-python
Created 11-14-2017 07:53 PM
Is there a difference between the kafka-connector in the python module and the confluent's one? This is the gihub link for the one mentioned in the python module,
Created 11-15-2017 07:29 PM
Confluent is the support company for Kafka. I personally would trust their code more than someone else's.