- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Do we config our hadoop right? JBOD vs RAID
- Labels:
-
Apache Hadoop
Created 04-05-2017 01:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional Questions:
- Do we need to set Raid 1 in our SSD ?
- Does our configuration in hadoop okay with Raid 1/5?
Master 1 (NameNode)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1(Caching)
- 3 x 2TB Raid 5 (hadoop)
Worker 2 (DataNode1)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1 (Caching)
- 3 x 2TB Raid 5 (hadoop)
Worker 3 (DataNode2)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1(Caching)
- 3 x 2TB Raid 5(hadoop)
Created 04-07-2017 01:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
Created 04-05-2017 03:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ping @Arpit Agarwal
Created 04-05-2017 11:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 04-07-2017 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! @Tom N
Created 04-06-2017 04:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using RAID can reduce the availability and fault tolerance of HDFS. It certainly reduces the overall performance as compared to JBOD.
We strongly recommend configuring your disks as JBOD since HDFS already stores data redundantly by replicating across nodes/racks and can automatically recover from disk and node failures.
Created 04-07-2017 12:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my configuration, setting RAID 1 on OS and in Caching is okay? or should I change it to RAID 0?
Created 04-07-2017 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks ! @Arpit Agarwal
Created 04-07-2017 01:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
Created 04-07-2017 04:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Predrag Minovic meaning my config should look like this:
Master 1 (NameNode)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1(Caching) - HOT (for HDFS)
- 3 x 2TB Raid 5 (hadoop) - one mount point WARM,COLD,ARCHIVE
Worker 2 (DataNode1)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
- 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE
Worker 3 (DataNode2)
Raid Configuration:
- 2 x 300GB – Raid 1 (OS)
- 2 x 400GB SSD Raid 1 (Caching) HOT (for HDFS)
- 3 x 2TB Raid 0 (hadoop) - 3 mount points WARM,COLD,ARCHIVE
Should we also remove RAID 1 to SSD?
Created 04-07-2017 05:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, now I understand what do you mean by "cashing". Yes, you can remove RAID-1 on SSD's, then you can experiment with One_SSD and All_SSD policies, either way there are multiple replicas, so no need for RAID. And by the way there is no storage policy for NN, if possible it will be good to move 2x400G SSD from NN to worker nodes.
