Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

kafka data pipeline

Explorer

i want to get data from a remote windows location into hdfs/spark streaming using apache kafka and i want this to be done automatically as the data gets into that folder. i am using winscp to load small files, but to do this automatically i need some other platform. can i know how to achieve this?

2 REPLIES 2

@pk reddy

One way is to use nifi, You can refer this article to get started : https://community.hortonworks.com/articles/26089/windows-share-nifi-hdfs-a-practical-guide.html (instead of HDFS processor, Publish kafka processor can be used)

Super Collaborator

You should consider just using Kafka for all ingestion. Run Kafka Connect locally. Point it at a directory.

http://kafka.apache.org/documentation/#connect

https://github.com/jcustenborder/kafka-connect-spooldir

Alternative solutions include Fluentd or Filebeat

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.