Support Questions
Find answers, ask questions, and share your expertise

Hadoop sizing and costs

Hadoop sizing and costs

New Contributor

Hi

I am trying to get cost estimation about a Hadoop cluster, and for that, I need to have some server and storage configuration

My Needs

1.Save 100 TB of data (so If I will use replication factor of 2 the total should be 200TB of disks)

2.The data will be save using Parquet

3.For querying the data we want to use Presto/Hve

4.Performance:

a.The customer wants to query 10TB of data in Max 1 hour, which means the system should query at least 3,000 MB /sec

b.While querying the data other ETL processes will load data into the Hadoop but much slower ( 2-5 MB /sec) using Scoop and Parquet

I am trying to think

1.How many NameNodes I need, how many CPU, how many Disk with how much TB, how many RAM to put in them

2.How many slave nodes I need, how many CPU, how many Disk with how much TB, how many RAM to put in them

3.How many ResourceManager nodes I need, how many CPU, how many Disk with how much TB, how many RAM to put in them

4.Do I need to configure also storage raid of the nodes?

5.Do I need anymore type of servers?

Thanks a lot!