Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Do we config our hadoop right? JBOD vs RAID

avatar

Additional Questions:

  1. Do we need to set Raid 1 in our SSD ?
  2. Does our configuration in hadoop okay with Raid 1/5?

Master 1 (NameNode)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 2 (DataNode1)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1 (Caching)
  • 3 x 2TB Raid 5 (hadoop)

Worker 3 (DataNode2)

Raid Configuration:

  • 2 x 300GB – Raid 1 (OS)
  • 2 x 400GB SSD Raid 1(Caching)
  • 3 x 2TB Raid 5(hadoop)
1 ACCEPTED SOLUTION

avatar
Master Guru

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

View solution in original post

11 REPLIES 11

avatar

Great! will take note of that, how about in Spark, should I set a buffer on SSDs for DISK persist?

avatar
Master Guru

Not sure about Spark, but IMO you can do that when you configure HDFS, put SSD nodes in another Ambari config group and set the space not to be used by HDFS.