06-13-2018 04:42 AM
I'm using Cloudera Enterprise v5.10.1 including Kafka. We're developing some Kafka streams applications that consumer from Kafka topics, are robust, scalable and performant. Currently these applications are being deployed using an Oozie workflow and essentially run forever (ie. I don't use a coordinator). Essentially if I want to scale it up I can just launch another instance using Oozie.
My question is there a better way to deploy Kafka streams application in Cloudera v5.10.1? I know there are frameworks like Mesos, Docker, etc... but not sure how well it would fit in with Cloudera.
11-23-2018 07:35 AM - edited 11-23-2018 07:35 AM
Starting with version 2.0.0, KafkaStreams was included with CDK Powered By Apache Kafka, however, it is not supported and it is not tested in our software. Hence any deployments of Kafka streams applications are possible, but not guaranteed to work & hence the community is likely unable to provide recommendations on how to launch & scale these applications.
Spark Streaming is the tried and tested technology for processing streams of data, and Cloudera has around hundred customers using Spark Streaming in production settings. In addition, with Spark Streaming, Cloudera's customers are able to benefit from the superior Spark ecosystem and libraries and machine learning support.
Depending on your use case, there may be a way to scale Spark Streaming to fit your requirements.