Support Questions

Find answers, ask questions, and share your expertise

Can I place the OS for the nodes Hadoop Cluster on a SAN while the Local Disk and Hadoop components/ bits reside on Local disks

avatar

I have a Hadoop cluster, each node on a 2 X 8GB fabric interconnect (48 Port on one RACK) , each server has a dedicated 10 GB NIC for each one.

To save space on each node, I would like to put the OS on a SAN backed by this CICSO UCS Interconnect. All Hadoop data would be stored on locally on DAS on Data Nodes (JBOD)

All Master nodes (and Edge Node) disks would be RAID and contain the Master components.

Only the OS would be on a SAN instead of locally.

Are there any issues with this?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Ancil McBarnett

I see no issues on having OS on disk backed up by SAN as long as there is strong bandwidth between Servers and SAN "in your case it is there"

On Separate note Cisco doc

Cisco UCS Big Data Updated Sept. 2015 Version

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Ancil McBarnett

I see no issues on having OS on disk backed up by SAN as long as there is strong bandwidth between Servers and SAN "in your case it is there"

On Separate note Cisco doc

Cisco UCS Big Data Updated Sept. 2015 Version

avatar
Master Mentor

@Ancil McBarnett

The tricky part is spof in case san goes down.

avatar
@Ancil McBarnett

I would not put the OS on the SAN. Where would the OS Cache be configured. This is usually not done, what are the benefits of putting the OS on SAN? It is an interesting thoughts and if you do tryout do share the results.

avatar
Master Mentor

@Ancil McBarnett accept best answer

avatar
Expert Contributor

We can store application related data and logs on SAN/NAS

However SAN/NAS are not at all recommended for I/O sensitive and CPU bound jobs , that is to avoid bottleneck situations while reading data from disk or from network or in processing data

So for Logs/application data --> SAN/NAS

Data nodes data --> DAS with JBOD configuration NO RAID

NN/SN/JT nodes --> should be highly available [ RAID 5/10(depends on usecase) ]

Hadoop is a scale out and shared nothing architecture

http://www.bluedata.com/blog/2015/12/separating-hadoop-compute-and-storage/

https://community.emc.com/servlet/JiveServlet/previewBody/41473-102-1-132603/Virtualizing%20Hadoop%2...

Also I understand sometimes true cost of DAS is also more considering Hadoop replication , but this is how Hadoop is thriving (One of the key tenets of Hadoop is to bring the compute to the storage instead of the storage to the compute.)