Support Questions
Find answers, ask questions, and share your expertise

kafka data pipeline

kafka data pipeline

Explorer

i want to get data from a remote windows location into hdfs/spark streaming using apache kafka and i want this to be done automatically as the data gets into that folder. i am using winscp to load small files, but to do this automatically i need some other platform. can i know how to achieve this?

2 REPLIES 2
Highlighted

Re: kafka data pipeline

@pk reddy

One way is to use nifi, You can refer this article to get started : https://community.hortonworks.com/articles/26089/windows-share-nifi-hdfs-a-practical-guide.html (instead of HDFS processor, Publish kafka processor can be used)

Highlighted

Re: kafka data pipeline

Super Collaborator

You should consider just using Kafka for all ingestion. Run Kafka Connect locally. Point it at a directory.

http://kafka.apache.org/documentation/#connect

https://github.com/jcustenborder/kafka-connect-spooldir

Alternative solutions include Fluentd or Filebeat