Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

kafka data pipeline


i want to get data from a remote windows location into hdfs/spark streaming using apache kafka and i want this to be done automatically as the data gets into that folder. i am using winscp to load small files, but to do this automatically i need some other platform. can i know how to achieve this?


@pk reddy

One way is to use nifi, You can refer this article to get started : (instead of HDFS processor, Publish kafka processor can be used)

Super Collaborator

You should consider just using Kafka for all ingestion. Run Kafka Connect locally. Point it at a directory.

Alternative solutions include Fluentd or Filebeat

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.