we have a brainstorm about the best way to implement solutions for our stream and processing layer (including ad-hoc reports) and our DWH cluster .
the question is between 2 approaches : a unified cluster over HDFS , that serves the steam app over yarn , and the DWH needs over kubernetes (it is a working and closed solution so it's given that it will use kubernetes).
note that the app A (stream app) will have completely other SLA and Qos than the app B (DWH )
the other approach is to have 2 separate clusters that will potentially demand more managing .
so the questions are :
1. is it feasible to accommodate both needs on the same cluster, with logic border between the two , for different resource utilization and Qos ?
2. if both apps will use spark , can spark use kubernetes for app B and yarn for app A ?
3. is their an known example of such a division in the production ?
4. i saw this link , is their any progress on this issue since ? https://hortonworks.com/blog/docker-kubernetes-apache-hadoop-yarn/
Hi Yair, this does not answer your question, but may help to understand where HDP 3.0 is heading this year with these blog posts - https://hortonworks.com/blog/data-lake-3-0-part-2-multi-colored-yarn/ the 4th blog has links to the other three here - https://hortonworks.com/blog/data-lake-3-0-part-4-cutting-storage-overhead-in-half-with-hdfs-erasure...