Support Questions
Find answers, ask questions, and share your expertise

Minimal number of HDD drives for HDFS grid

Minimal number of HDD drives for HDFS grid


We will be setting up a HDP 3 cluster, and the idea is to split 1xRAID1 and 2xHDD through 2 virtual machines. 1xRAID1 would handle the OS and local user files, while each of the 2xHDD's would be used for hdfs /grid/.

My question: is 2xHDD enough to ensure file redundancy? I've read that Hadoop 3 does not require 3 replicas as it did in Hadoop 2, however, in case of a drive failure, would it still ensure that the files on the remaining HDD are enough for a full recovery?

Or would you advice partitioning some of the 1xRAID1 for a third /grid/ drive for the hdfs? In this case it seems to introduce a lot of overhead due to RAID.

What is the minimum amount of HDD's required for redundant hdfs operation?