Support Questions
Find answers, ask questions, and share your expertise

Do we config our hadoop right? JBOD vs RAID

Solved Go to solution

Do we config our hadoop right? JBOD vs RAID

Contributor

Additional Questions:

  1. Do we need to set Raid 1 in our SSD ?
  2. Does our configuration in hadoop okay with Raid 1/5?

Master 1 (NameNode)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 2 (DataNode1)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 3 (DataNode2)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5(hadoop)
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Do we config our hadoop right? JBOD vs RAID

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

View solution in original post

11 REPLIES 11

Re: Do we config our hadoop right? JBOD vs RAID

Re: Do we config our hadoop right? JBOD vs RAID

New Contributor

Re: Do we config our hadoop right? JBOD vs RAID

Contributor

Thanks! @Tom N

Re: Do we config our hadoop right? JBOD vs RAID

Using RAID can reduce the availability and fault tolerance of HDFS. It certainly reduces the overall performance as compared to JBOD.

We strongly recommend configuring your disks as JBOD since HDFS already stores data redundantly by replicating across nodes/racks and can automatically recover from disk and node failures.

Re: Do we config our hadoop right? JBOD vs RAID

Contributor

In my configuration, setting RAID 1 on OS and in Caching is okay? or should I change it to RAID 0?

Re: Do we config our hadoop right? JBOD vs RAID

Contributor

Thanks ! @Arpit Agarwal

Re: Do we config our hadoop right? JBOD vs RAID

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

View solution in original post

Re: Do we config our hadoop right? JBOD vs RAID

Contributor

@Predrag Minovic meaning my config should look like this:

Master 1 (NameNode)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching) - HOT (for HDFS)
  • 3 x 2TB Raid 5 (hadoop) - one mount point WARM,COLD,ARCHIVE

Worker 2 (DataNode1)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
  • 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE

Worker 3 (DataNode2)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
  • 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE

Should we also remove RAID 1 to SSD?

Re: Do we config our hadoop right? JBOD vs RAID

Okay, now I understand what do you mean by "cashing". Yes, you can remove RAID-1 on SSD's, then you can experiment with One_SSD and All_SSD policies, either way there are multiple replicas, so no need for RAID. And by the way there is no storage policy for NN, if possible it will be good to move 2x400G SSD from NN to worker nodes.