We are in the process of evaluating approach for building data pipeline for real-time and near realtime data integratiuon from source like mainfarme, RDBMS, and fie systems.
We are reviewing Kafka and Spark Streaming to achieve this. We have many questions on Producer API vs Kafka Connectors, and what is supported in Cloudera CDK as far as Kafka connectors are concerned. Do we have connectors for file systems, RDBMS with CDK? Can we achieve CDC using Flafka (Flume + Kafka) from Cloudera distribution.