Created 01-19-2016 07:25 PM
I have a Hadoop cluster, each node on a 2 X 8GB fabric interconnect (48 Port on one RACK) , each server has a dedicated 10 GB NIC for each one.
To save space on each node, I would like to put the OS on a SAN backed by this CICSO UCS Interconnect. All Hadoop data would be stored on locally on DAS on Data Nodes (JBOD)
All Master nodes (and Edge Node) disks would be RAID and contain the Master components.
Only the OS would be on a SAN instead of locally.
Are there any issues with this?
Created 01-19-2016 07:27 PM
I see no issues on having OS on disk backed up by SAN as long as there is strong bandwidth between Servers and SAN "in your case it is there"
On Separate note Cisco doc
Created 01-19-2016 07:27 PM
I see no issues on having OS on disk backed up by SAN as long as there is strong bandwidth between Servers and SAN "in your case it is there"
On Separate note Cisco doc
Created 01-19-2016 11:14 PM
The tricky part is spof in case san goes down.
Created 01-19-2016 07:39 PM
I would not put the OS on the SAN. Where would the OS Cache be configured. This is usually not done, what are the benefits of putting the OS on SAN? It is an interesting thoughts and if you do tryout do share the results.
Created 02-02-2016 04:27 PM
@Ancil McBarnett accept best answer
Created 07-19-2016 06:02 AM
We can store application related data and logs on SAN/NAS
However SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data
So for Logs/application data --> SAN/NAS
Data nodes data --> DAS with JBOD configuration NO RAID
NN/SN/JT nodes --> should be highly available [ RAID 5/10(depends on usecase) ]
Hadoop is a scale out and shared nothing architecture
http://www.bluedata.com/blog/2015/12/separating-hadoop-compute-and-storage/
Also I understand sometimes true cost of DAS is also more considering Hadoop replication , but this is how Hadoop is thriving (One of the key tenets of Hadoop is to bring the compute to the storage instead of the storage to the compute.)