Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do we config our hadoop right? JBOD vs RAID

avatar
Contributor

Additional Questions:

  1. Do we need to set Raid 1 in our SSD ?
  2. Does our configuration in hadoop okay with Raid 1/5?

Master 1 (NameNode)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 2 (DataNode1)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 3 (DataNode2)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5(hadoop)
1 ACCEPTED SOLUTION

avatar
Master Guru

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

View solution in original post

11 REPLIES 11

avatar

avatar
Explorer

avatar
Contributor

Thanks! @Tom N

avatar

Using RAID can reduce the availability and fault tolerance of HDFS. It certainly reduces the overall performance as compared to JBOD.

We strongly recommend configuring your disks as JBOD since HDFS already stores data redundantly by replicating across nodes/racks and can automatically recover from disk and node failures.

avatar
Contributor

In my configuration, setting RAID 1 on OS and in Caching is okay? or should I change it to RAID 0?

avatar
Contributor

Thanks ! @Arpit Agarwal

avatar
Master Guru

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

avatar
Contributor

@Predrag Minovic meaning my config should look like this:

Master 1 (NameNode)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching) - HOT (for HDFS)
  • 3 x 2TB Raid 5 (hadoop) - one mount point WARM,COLD,ARCHIVE

Worker 2 (DataNode1)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
  • 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE

Worker 3 (DataNode2)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
  • 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE

Should we also remove RAID 1 to SSD?

avatar
Master Guru

Okay, now I understand what do you mean by "cashing". Yes, you can remove RAID-1 on SSD's, then you can experiment with One_SSD and All_SSD policies, either way there are multiple replicas, so no need for RAID. And by the way there is no storage policy for NN, if possible it will be good to move 2x400G SSD from NN to worker nodes.