I am new to this whole containerization methodology and am wondering if my intended cluster architecture is a misconception. We have a Openshift cluster, which runs Docker containers.
What I want to do is the following:
Unfortunately we can not use Hortonworks Cloudbreak with OpenShift as it is not compatible at the moment. So this is my workaround solution. What I obtain is a containerized cluster that should just act like typical (virtual) servers - is this assumption correct?
What are the pitfalls of a solution like this? It looks good to me on the paper and I could not think of major limitations right now. We have dedicated storage nodes that can persist data that is written in the containers.
I want to ask more specifically:
So it is not a problem to start with a naked RHEL docker container and successively add/install new packages to it? I.e. add Ambari Agent, HDFS, Hive, etc. to one single the container? If I understood correctly, it looks like this is what Hortonworks Cloudbreak is doing.
This would make provisioning and managing our cluster much easier. We would begin to use separate docker containers after our solution with the "big" docker container works.
It is not a problem to have all the services in a single container, it might be a Docker anti-pattern, but it should work, you can go that way.
Cloudbreak has not been creating HDP clusters in Docker containers for ~2 years now, before then CB deployed into containers, but there were some stability and enterprise supportability issues with that approach back then, which might have been resolved since then.
Hope this helps!