Support Questions

Find answers, ask questions, and share your expertise

Best Practice for Dev, QA and Production for a Hadoop Cluster

avatar

Since Hadoop is not a typical Enterprise software, we are having trouble getting the QA team to understand how it fits into our application landscape. They would like us to have three separate environments for Dev, QA and Production. Do you typically see this, or do you have any best practice documentation that we could provide to them?

1 ACCEPTED SOLUTION

avatar

@Ancil McBarnett I see this pattern in all my customers. Dev tends to be small. Sometimes dev is comprised of only sandbox instances and is almost always a virtual environment. Test mimics prod in all configuration aspects but tends to be about 30%-50% prod capacity.

Upgrades, configuration changes, patching, tech previews all occur in the test environment prior to any production rollout. In the end, Hadoop isn't much different than other platforms as far as this is concerned.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Ancil McBarnett

Yes to 3 environment

Dev and Qa does not need to as big as prod.

DR is required too and we can use DR for reporting

avatar

@Ancil McBarnett I see this pattern in all my customers. Dev tends to be small. Sometimes dev is comprised of only sandbox instances and is almost always a virtual environment. Test mimics prod in all configuration aspects but tends to be about 30%-50% prod capacity.

Upgrades, configuration changes, patching, tech previews all occur in the test environment prior to any production rollout. In the end, Hadoop isn't much different than other platforms as far as this is concerned.

avatar
You have to decide how many clusters you need for the below tasks which apply to Hadoop applications the same was as they apply to typical Enterprise software:
  1. Test upgrade procedures for new versions of existing components
  2. Execute performance tests of custom-built applications
  3. Allow end-users to perform user acceptance testing
  4. Execute integration tests where custom-built applications communicate with third-party software
  5. Experiment with new software that is beta quality and may not be ready for usage at all
  6. Execute security penetration tests (typically done by an external company)
  7. Let application developers modify configuration parameters and restart services on short notice
  8. Maintain a mirror image of production environment to be activated in case of natural disaster or unforeseen events
  9. Execute regression tests that compare the outputs of new application code with existing code running in production

I believe, DEV -> QA -> PROD is a minimum and I have seen larger organizations deploy LAB -> DEV -> QA -> PROD -> DR as separate clusters.

avatar
Master Mentor

@Ancil McBarnett please accept best answer

avatar
Rising Star

@Neeraj Sabharwal

Could you elaborate further on how DR cluster can be used for reporting?

Many thanks

avatar
New Contributor

Having 4 environments including development, testing, pre-production/staging and production in a Big company would be good for best practices because in staging we can make sure that all are working properly. Of course the dev, testing and staging environments are smaller than planned production. For instance, if I take 2 nodes in dev, testing and staging then we can have a almost 8 nodes in production and again it's always depends on replication, traffic, and other relevant facts. Thanks!