Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HDP - Environments segmentation

Explorer

Hi Folks,

I'm wondering how you guys dealing with environments segmentation in a big data world in a cloud context (yes, I'm relatively new to this).

Do you have like Dev, Qa, Prod clusters - each in separate subnet or even VNet, and then Edge nodes for each env in a DMZ subnet?
Or maybe one cluster, with dev, qa, prod folders on HDFS, separate Yarn queues, backed with Ranger?

1 REPLY 1

Cloudera Employee

While the multi-tenant features of HDP (e.g. YARN capacity scheduler, Ranger policies, HDFS quotas, etc.) could be used to combine Dev/QA/Prod environments into a single cluster, it is generally not recommended.

Managing a single cluster instead of three seems easier on the surface, but it is really not worth it. First of all, where are developers going to test against new versions if you only have one cluster? Combining Dev and QA may be an option, but is more of an organizational decision.

A configuration I like is Prod, DR/Ad-hoc, and Dev/QA. Most companies require a DR environment in sync with production. By making that DR environment read-only, you can run exploratory analytics and/or data science workloads using resources that would otherwise sit idle. Additionally, pulling the lower priority and unpredictable workloads out of production reduces the risk of missing SLAs.

Of course, all of this is use case dependent, and your mileage may vary. The best thing about "big data" technologies is how customizable and broadly applicable they are, and the worst thing is how customizable and broadly applicable they are 🙂

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.