Support Questions
Find answers, ask questions, and share your expertise

JBODs vs RAID for Data Nodes

JBODs vs RAID for Data Nodes

New Contributor

Any articles on why you should use JBOD vs RAID6 on the Data Nodes. I am new to Hadoop/Hortonworks (but reading everything I can) and we still in our planning stages. We are initially going to purchase 4 servers each with (16) 10TB SATA disks. I know you are supposed to use JBOD but some other admins (who have not yet done the research into Hadoop) are saying they want to use RAID6. Because we are all new I am looking for some info on recommended configurations or why a RAID wouldn't be ideal. Thanks!

8 REPLIES 8

Re: JBODs vs RAID for Data Nodes

Super Guru

@R F

First thing first. Hold on to purchasing that hardware. 10 TB per disk? You need to understand Hadoop basics before you make that purchase.

First, Hadoop is a distributed file system. It is created to manage big data. Files in Hadoop are large with each file block at a minimum 128 MB. If you have a file with 300 MB, then you have 3 blocks of file (128 MB, 128 MB, and 44 MB) distributed across the cluster. Meaning one file is not on one machine. Hadoop makes three copies of each block and distributes blocks across nodes in the cluster. In case of failure, that block can be pulled from a node that is still up. Since three copies of data are already made, you don't need RAID 6. By the way, if you don't like making 3 copies of data, the newer version of Hadoop will include support for Erasure coding (RAID 6 techniques of data resiliency). Again, no need to implement your own RAID 6. JBOD is the way to go.

Now, if you put 16x10 TB drives on one machine, and that machine goes down you lose 160 TB of data (it's going to be less but just assume theoretically). When Hadoop loses a machine and realizes that its operating with under replicated blocks, it will start making copies of lost block on another node to ensure it has three copies of data. In your case, that would be 160 TB. Do some maths and it will come around 35 hours of data movement across the cluster (to copy 160 TB of blocks).

I am going to write something here and publish, so you understand how this works. But don't buy 10 TB disks for Hadoop.

Re: JBODs vs RAID for Data Nodes

New Contributor

Thanks for the tip. I started question myself about the 10TB as I was reading more articles but your disaster recovery scenario is a very practicle example.

Re: JBODs vs RAID for Data Nodes

Super Guru

@R F

Please see my updated answer. In short big no to 10 TB disks. Ideally you want to increase your cluster size. Go with JBOD and I personally wouldn't recommend more than 12x2TB disks but based on my answer, plug in your numbers and decide what works for you.

Re: JBODs vs RAID for Data Nodes

Super Guru

For some reason unable to publish this as an article, so let me just paste this here:

Hardware vendors now offer disks up to 8TB capacity that can offer customers up to 96 TB of storage per node assuming twelve spindles per node. Even if you go with 12x4TB disks, that is still a whopping 48 TB of storage per node. For most cases, I have always recommended my customers 12x2TB disks over last three years and I continue to do so, given bandwidth remains very expensive, and as we'll see below, it is a very important component when you are sizing a cluster and deciding between high and low density data nodes.

The calculations I am sharing here were done for a customer when they told me that re-replication of blocks when a node fails takes a very long time. This customer had 12x4TB disks on each node.

So rather than preferring one opinion over the other, let's do some maths and then decide what works for your use case. There is no right or wrong answer. As long as you understand what you are doing and the scenario of what's going to happen when a failure occurs is acceptable risk for your business, then choose that method. This article is to help guide you make that decision.

Let us make some assumptions about our jobs and disks.

Assume a server chassis that allows 12 spindles.

Have 2x1TB disks in RAID1 for OS.

10x2TB disks in JBOD (RAID0) for data nodes.

Assume 50 MB/s per spindle throughput.

In case of failure of one node, we can expect following traffic

10x50MB/s x 0.001 (convert MB to GB) = 0.5 GB/s x 8(convert GB to Gb) = 4.8 Gb/s

Assume 16 TB of data on the disks that needs to re replicated. 16 TB x 1000 (Convert TB to GB) = 16000 GB x 8 (convert GB to Gb) = 96000 Gb.

Time required to re-replicate lost data blocks = 96000 Gb/4.8 Gb per sec = 20000 seconds /60 = 333.33 minutes = 5.55 hours.

Now see, what happens when you have 48 TB of storage. Assume

2x1TB disks in RAID1 for OS

10x4TB disks in JBOD (RAID0) for data nodes.

Again assume 50 MB/s per spindle throughput

In case of failure of one node, we can expect following traffic.

10x50MB/s x 0.001 (convert MB to GB) = 0.5 GB/s x 8(convert GB to Gb) = 4.8 Gb/s

Assume 36 TB of data on the disks that needs to be re-replicated. 36 TB x 1000 (Convert TB to GB) = 36000 GB x 8(Convert GB to Gb) = 288000 Gb.

Time required to re-replicate lost data blocks = 288000 Gb/4.8Gb per sec = 60000 seconds/60 = 1000 minutes/60 = 16 hours.

Now this can be improved if instead of a chassis with 12 disks, you have a server chassis that allows 24 disks. Then, instead of 10x4TB disks, you will have 22x2TB disks (given 2 disks will be used for OS). This improvement will come at the expense of higher bandwidth. Remember, there is no free lunch. Let's see what happens in this case.

2x1TB disks in RAID1 for OS

22x2TB disks in JBOD (RAID0) for data nodes.

Again assume 50MB/s spindle.

In case of failure of one node, we can expect following traffic.

22x50MB/s x 0.001(Convert MB to GB) = 1.1 GB/s x 8(convert GB to Gb) = 8.8 Gb/s

Assume 40 TB of data on the disks that needs to be re-replicated. 40 TB x 1000(Convert TB to GB) = 40000 GB x 8(Convert GB to Gb) = 320,000 Gb.

Time required to re-replicate lost data blocks = 320,000 Gb/8.8 Gb per sec = 36,363 seconds/60 = 606 minutes/60 = 10 hours.

So, the time to re-replicate lost blocks is down to 10 hours from 16 hours while you also increased the amount of data on each node by 4TB.

As you have seen that number of spindles improve performance. They also use more bandwidth. But under normal circumstances when you are not re-replicating blocks due to failure, more spindles will result in better performance.

Depending on the use case, assuming performance is desired, 12x2TB is better than 12x4TB and similarly 24 x 1TB is better than 12x2TB.

Your decision to choose number of disks should also consider other factors like MTTF of a disk which will impact the number of failures you can expect as you increase the number of disks. That discussion for some other time.

Re: JBODs vs RAID for Data Nodes

New Contributor

Thanks for this it is great info and helps me in my research. Can you give me the blog or website you got this from?

Re: JBODs vs RAID for Data Nodes

Super Guru

@R F

I actually wrote this myself. I am trying to publish it but apparently, I am unable to. I'll let you know if I am able to publish it here.

Re: JBODs vs RAID for Data Nodes

New Contributor

Awesome thanks!

Re: JBODs vs RAID for Data Nodes

Super Guru

you are welcome. If one of my answers helped and answers your original question, please accept.