I'm looking for ways to get data from Kafka to Python.
Currently I'm using this pipeline. Has anyone faced issues with using Flume?
Flume(exec-source and Kafka-sink) --> Kafka --> Flume(kafka-source and HDFS-sink)
Other options: In case I have a kafka-consumer written, is there a python way of getting the data from Kafka consumer to HDFS (other than Confluent's Connect API)?
Or are there any other means I can get the data from Kafka t HDFS?
I see Flume is deprecated and will be removed from HDP in the future releases as mentioned in the HDP-2.6.2-Release Notes. Are there any other techniques that could be used with kafka to get data into HDFS?
@Swaapnika Guntaka You could use Spark Streaming in PySpark to consume a topic and write the data to HDFS.
You could also use HDF with NiFi and skip Python entirely.
Also, this is a Python client, by Confluent, not related to Kafka Connect. https://github.com/confluentinc/confluent-kafka-python
Confluent is the support company for Kafka. I personally would trust their code more than someone else's.