Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data pipeline for realtime/near real time using kafka adn spark streaming

Data pipeline for realtime/near real time using kafka adn spark streaming

New Contributor

We are in the process of evaluating approach for building data pipeline for real-time and near realtime data integratiuon from source like mainfarme, RDBMS, and fie systems.


We are reviewing Kafka and Spark Streaming to achieve this. We have many questions on Producer API vs Kafka Connectors, and what is supported in Cloudera CDK as far as Kafka connectors are concerned. Do we have connectors for file systems, RDBMS with CDK? Can we achieve CDC using Flafka (Flume + Kafka) from Cloudera distribution.

 

Any guidance of this will be tremendous help.


Regards

Rahul Shah