New Contributor
Posts: 3
Registered: ‎03-18-2016

Data pipeline for realtime/near real time using kafka adn spark streaming

We are in the process of evaluating approach for building data pipeline for real-time and near realtime data integratiuon from source like mainfarme, RDBMS, and fie systems.

We are reviewing Kafka and Spark Streaming to achieve this. We have many questions on Producer API vs Kafka Connectors, and what is supported in Cloudera CDK as far as Kafka connectors are concerned. Do we have connectors for file systems, RDBMS with CDK? Can we achieve CDC using Flafka (Flume + Kafka) from Cloudera distribution.


Any guidance of this will be tremendous help.


Rahul Shah