Created 03-14-2017 03:34 PM
My question on HDFS using SAN as the backend storage has 3 main parts
1. Is it feasible to use SAN as the back end storage for HDFS?
2. What are the pros and cons of using SAN or NAS for HDFS?
3. Has it been tested for performance and may be other aspects?
Created 03-15-2017 02:18 AM
I assume that you are asking for production environment.
Hadoop is a scale out and shared nothing architecture
SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data
However, it is possible to use, but I haven't found one implementation to deliver performance. I would not recommend it for production. For a dev environment, maybe.
Maybe 5% of companies in Hadoop use Isilon for Hadoop. Those are those that are in a close relationship with EMC. There are references using storage arrays like Isilon. Hortonworks supports it. Performance is less than using internal JBOD disks, but it works.
Yes. It has been tested. You may want to go to EMC published articles on Isilon. I won't be able to provide confidential data that is not in the public domain. If you need confidential you could check with EMC/Dell or Hortonworks account manager for your company.
Take a look at the following:
https://community.hortonworks.com/questions/15332/san-vs-dasjbod-on-data-node.html which will show differences between NAS and NAS when to be used with Hadoop.
++++
Hopefully, it helps and you can vote/accept best answer.
Created 03-15-2017 02:18 AM
I assume that you are asking for production environment.
Hadoop is a scale out and shared nothing architecture
SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data
However, it is possible to use, but I haven't found one implementation to deliver performance. I would not recommend it for production. For a dev environment, maybe.
Maybe 5% of companies in Hadoop use Isilon for Hadoop. Those are those that are in a close relationship with EMC. There are references using storage arrays like Isilon. Hortonworks supports it. Performance is less than using internal JBOD disks, but it works.
Yes. It has been tested. You may want to go to EMC published articles on Isilon. I won't be able to provide confidential data that is not in the public domain. If you need confidential you could check with EMC/Dell or Hortonworks account manager for your company.
Take a look at the following:
https://community.hortonworks.com/questions/15332/san-vs-dasjbod-on-data-node.html which will show differences between NAS and NAS when to be used with Hadoop.
++++
Hopefully, it helps and you can vote/accept best answer.
Created 08-07-2019 05:42 PM
With the Advent of heterogeneous storage for hdfs can we now look at Nas in a new light ..
Potentially we could lable Nas mounts on a data nodes as archive storage and have hdfs move data in there when it becomes cold
I would like to hear opinions on this