07-21-2015 04:16 AM - edited 07-21-2015 04:18 AM
I would like to know what is the minimum recommended configuration for setting up a sandbox / training environment for Cloudeara Big data platform,
For example lets assume that, if i want to have some 500 active users, who level of experiences varies from novice to expert, then whats the recommended configuration,
- How many name nodes & whats the configuration ?
- How many data nodes and its corresponding configurations ?
Thank You !
07-21-2015 04:19 AM
That's really a how-long-is-a-piece-of-string question, and is unanswerable like this. The number of users doesn't matter; it does matter how much data you are storing and what you're doing with it during training, and how many people are using it at once.
07-21-2015 04:40 AM
07-21-2015 04:44 AM
100MB is very tiny; that's a total of 20GB of storage. A single drive these days carries 50 times as much storage.
Do you mean 100GB? Then you need 20TB storage. That fits into a couple machines.
It's still really the workload that drives your requirements though, so I don't know, but based on storage alone (assuming you meant GB) this sounds like a cluster of 2-3 big commodity machines.
07-21-2015 05:08 AM
Just wanted to know one more info..
How much ram will be consumed on a master (name node) when all the below services are running,
Assuming there is absolutely no operations running.
07-21-2015 05:23 AM
Again, it depends on what you are doing with them. The daemons might consume a few gigabytes as a baseline, each; of course, they can consume hundreds of gigabytes if you're running workloads that need a bunch of memory.
Currently incubating in Cloudera Labs:Envelope