Created 04-05-2017 01:08 AM
Additional Questions:
Master 1 (NameNode)
Raid Configuration:
Worker 2 (DataNode1)
Raid Configuration:
Worker 3 (DataNode2)
Raid Configuration:
Created 04-07-2017 01:00 AM
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
Created 04-05-2017 03:52 AM
Ping @Arpit Agarwal
Created 04-05-2017 11:32 PM
Created 04-07-2017 06:53 AM
Thanks! @Tom N
Created 04-06-2017 04:33 PM
Using RAID can reduce the availability and fault tolerance of HDFS. It certainly reduces the overall performance as compared to JBOD.
We strongly recommend configuring your disks as JBOD since HDFS already stores data redundantly by replicating across nodes/racks and can automatically recover from disk and node failures.
Created 04-07-2017 12:52 AM
In my configuration, setting RAID 1 on OS and in Caching is okay? or should I change it to RAID 0?
Created 04-07-2017 06:53 AM
Thanks ! @Arpit Agarwal
Created 04-07-2017 01:00 AM
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
Created 04-07-2017 04:46 AM
@Predrag Minovic meaning my config should look like this:
Master 1 (NameNode)
Raid Configuration:
Worker 2 (DataNode1)
Raid Configuration:
Worker 3 (DataNode2)
Raid Configuration:
Should we also remove RAID 1 to SSD?
Created 04-07-2017 05:41 AM
Okay, now I understand what do you mean by "cashing". Yes, you can remove RAID-1 on SSD's, then you can experiment with One_SSD and All_SSD policies, either way there are multiple replicas, so no need for RAID. And by the way there is no storage policy for NN, if possible it will be good to move 2x400G SSD from NN to worker nodes.