A post from 2016 outlined the broad differences between CDH and Confluent platforms for Kafka. The default package of Kafka included in CDH and available to activate via the manager allows me to create zookeeper and broker instances, create topics, post avro messages to topics and consume them. I have been able to create producers and consumers in Python but had to install kafka-python.
1. Will kafka-streams API be part of CDH or is this something I have to install separately?
2. Is Spark stream the preferred streaming platform for CDH?
3. Will Schema Registry and REST Proxy be part of CDH or do I install them separately?
4. Will ksql be availabel out of the box on CDH or do I install separately?
Pardon my ignorance. Pretty new to the "big data" ecosystem and overwhelmed by the plethora of tools.
While these components are open source and you are free to test them and install them in Cloudera Kafka, they are not supported and have not been tested by Cloudera's dev team.
1 & 2) Starting with version 2.0.0, KafkaStreams was included with CDK Powered By Apache Kafka, however, it is not supported. Spark Streaming is the tried and tested technology for processing streams of data, and Cloudera has around hundred customers using Spark Streaming in production settings. In addition, with Spark Streaming, Cloudera's customers are able to benefit from the superior Spark ecosystem and libraries and machine learning support.
3) Schema Registry is currently not supported. As it is open source, users have the option to install separately in an unsupported manner or building your own lightweight alternative as per: