Created 04-16-2021 04:14 PM
We are looking for some kind of utility or tool to read the data from HDFS and place it in the kafka topic. Appreciate your inputs.
From the community section, we came across this "You could use Apache NiFi with a ListHDFS + FetchHDFS processor followed by PublishKafka"...Can you provide more insight how this can be acheived
Thank you
Srinu
Created 04-26-2021 09:58 PM
Hello @sriven
Found this - https://community.cloudera.com/t5/Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-the...
Please let me know if it helps.
Thanks & Regards,
Nandini
Created 04-28-2021 11:46 AM
Created 04-28-2021 02:34 PM
Please try Kafka connect then, that seems to be the best option suited.
Created on 04-30-2021 03:36 PM - edited 04-30-2021 03:39 PM
How to read parquet files using Kconnect.?
In simple,We just want to read the parquet files on HDFS using kconnect and without spark jobs?
Please let us know if there is a solution or not?
Created 05-03-2021 07:44 AM
As you know,
We have limitation with source kafka connector that it works for HDFS objects/files created only by the HDFS 2 Sink Connector for Confluent Platform
and how we can pull the files if created by other spark,mapreduce or any other jobs on HDFS?
The use case of HDFS source connector is only to mirror the same data on kafka.
Created 05-06-2021 10:34 PM
Please try Nifi - Kakfa