02-20-2018 01:58 PM
I heard that Cloudera is working on Kubernetes as a platform. Is this true? If so, is there any news or updates? I would like to know if and when it will replace YARN. We currently are moving to Kubernetes to underpin all our services. It would be beneficial and simpler to maintain Kubernetes if we could use Cloudera Manager hangled this.
03-15-2018 06:08 PM
Thanks for your interest. Cloudera is indeed in the early stages of looking at Kubernetes to see how it might benefit Cloudera users. Nothing definite thus far.
You touched on the largest challenge: YARN. For Spark, a Spark-on-K8s project is making rapid progress on integrating these two tools. The result will be that YARN is needed only for MapReduce (MR). At present, there is no clear community solution for MR on Kubernetes, so we're looking into options.
You are right that some changes would be needed to Cloudera Manager (CM): CM need not be in the business of launching processes; CM would instead coordinate with K8s to launch containers.
Would be helpful to understand a bit more about how you'd want to use Kubernetes.
In your own deployment, do you use Spark? MR (perhaps via Hive)? Other distributed compute engines?
Would you want Kubernetes to manage your HDFS data nodes (which would require associating pods with the nodes that have disks), or would you use some other storage solution?
About how large would your cluster be (rough order-of-magnitude: 10, 50, 100, etc.)?