I have a Hadoop cluster, each node on a 2 X 8GB fabric interconnect (48 Port on one RACK) , each server has a dedicated 10 GB NIC for each one.
To save space on each node, I would like to put the OS on a SAN backed by this CICSO UCS Interconnect. All Hadoop data would be stored on locally on DAS on Data Nodes (JBOD)
All Master nodes (and Edge Node) disks would be RAID and contain the Master components.
Only the OS would be on a SAN instead of locally.
Are there any issues with this?
I would not put the OS on the SAN. Where would the OS Cache be configured. This is usually not done, what are the benefits of putting the OS on SAN? It is an interesting thoughts and if you do tryout do share the results.
We can store application related data and logs on SAN/NAS
However SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data
So for Logs/application data --> SAN/NAS
Data nodes data --> DAS with JBOD configuration NO RAID
NN/SN/JT nodes --> should be highly available [ RAID 5/10(depends on usecase) ]
Hadoop is a scale out and shared nothing architecture
Also I understand sometimes true cost of DAS is also more considering Hadoop replication , but this is how Hadoop is thriving (One of the key tenets of Hadoop is to bring the compute to the storage instead of the storage to the compute.)