Support Questions

Find answers, ask questions, and share your expertise

Feasibility and recommendation for running HDFS on SAN or NAS?

avatar
Expert Contributor

My question on HDFS using SAN as the backend storage has 3 main parts

1. Is it feasible to use SAN as the back end storage for HDFS?

2. What are the pros and cons of using SAN or NAS for HDFS?

3. Has it been tested for performance and may be other aspects?

1 ACCEPTED SOLUTION

avatar
Super Guru

@learninghuman

I assume that you are asking for production environment.

Hadoop is a scale out and shared nothing architecture

SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data

However, it is possible to use, but I haven't found one implementation to deliver performance. I would not recommend it for production. For a dev environment, maybe.

Maybe 5% of companies in Hadoop use Isilon for Hadoop. Those are those that are in a close relationship with EMC. There are references using storage arrays like Isilon. Hortonworks supports it. Performance is less than using internal JBOD disks, but it works.

Yes. It has been tested. You may want to go to EMC published articles on Isilon. I won't be able to provide confidential data that is not in the public domain. If you need confidential you could check with EMC/Dell or Hortonworks account manager for your company.

Take a look at the following:

https://community.hortonworks.com/questions/15332/san-vs-dasjbod-on-data-node.html which will show differences between NAS and NAS when to be used with Hadoop.

++++

Hopefully, it helps and you can vote/accept best answer.

View solution in original post

2 REPLIES 2

avatar
Super Guru

@learninghuman

I assume that you are asking for production environment.

Hadoop is a scale out and shared nothing architecture

SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data

However, it is possible to use, but I haven't found one implementation to deliver performance. I would not recommend it for production. For a dev environment, maybe.

Maybe 5% of companies in Hadoop use Isilon for Hadoop. Those are those that are in a close relationship with EMC. There are references using storage arrays like Isilon. Hortonworks supports it. Performance is less than using internal JBOD disks, but it works.

Yes. It has been tested. You may want to go to EMC published articles on Isilon. I won't be able to provide confidential data that is not in the public domain. If you need confidential you could check with EMC/Dell or Hortonworks account manager for your company.

Take a look at the following:

https://community.hortonworks.com/questions/15332/san-vs-dasjbod-on-data-node.html which will show differences between NAS and NAS when to be used with Hadoop.

++++

Hopefully, it helps and you can vote/accept best answer.

avatar
Contributor

With the Advent of heterogeneous storage for hdfs can we now look at Nas in a new light ..

Potentially we could lable Nas mounts on a data nodes as archive storage and have hdfs move data in there when it becomes cold

I would like to hear opinions on this