Support Questions

sriven · ‎04-16-2021

We are looking for some kind of utility or tool to read the data from HDFS and place it in the kafka topic. Appreciate your inputs.

From the community section, we came across this "You could use Apache NiFi with a ListHDFS + FetchHDFS processor followed by PublishKafka"...Can you provide more insight how this can be acheived

Thank you
Srinu

Nandinin · ‎04-26-2021

Hello @sriven

Found this - https://community.cloudera.com/t5/Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-the...

Please let me know if it helps.

Thanks & Regards,

Nandini

SME || Kafka | Schema Registry | SMM | SRM

sriven · ‎04-28-2021

Hello @Nandinin ,

We have gonbe through this already.

Anything without Scala/Spark ?

Nandinin · ‎04-28-2021

Please try Kafka connect then, that seems to be the best option suited.

SME || Kafka | Schema Registry | SMM | SRM

sriven · ‎04-30-2021

How to read parquet files using Kconnect.?

In simple,We just want to read the parquet files on HDFS using kconnect and without spark jobs?

Please let us know if there is a solution or not?

sriven · ‎05-03-2021

As you know,

We have limitation with source kafka connector that it works for HDFS objects/files created only by the HDFS 2 Sink Connector for Confluent Platform

and how we can pull the files if created by other spark,mapreduce or any other jobs on HDFS?

The use case of HDFS source connector is only to mirror the same data on kafka.

Nandinin · ‎05-06-2021

Please try Nifi - Kakfa

https://community.cloudera.com/t5/Community-Articles/Apache-NiFi-1-10-Support-for-Parquet-RecordRead...

SME || Kafka | Schema Registry | SMM | SRM

Cloudera Community

Support Questions

How to read data from HDFS and place into Kafka (don’t want to use Scala/Spark)? Any utilities or methods?