Member since
06-15-2016
2
Posts
0
Kudos Received
0
Solutions
06-15-2016
07:10 PM
@mqureshi Thanks for the reply. The SAN is a Nimble SAN and the VM's hosting Hadoop are sitting on a Cisco UCS environment. We have fiber connectivity between the UCS and SAN so essentially my throughput is around 16GB/s and running around 1.6k IOPS per second as far as Hadoop is concerned. I've already run a few test jobs that we run on production in this environment and had great results, so performance and latency isn't a concern at this point. We are using the UCS / SAN with other environments with resource / storage policies in place to segregate resources as much as possible. I'm just trying to take this further by separating out the storage and compute nodes and curious how that configuration / architecture would look in respect to a HDP deployment. Thanks for your input.
... View more
06-15-2016
05:19 PM
Hello, I'm currently running some proof of concepts in a new HDP 2.4 cluster that I am virtualizing with a SAN back-end that is all flash. I've been reading articles such as this http://www.bluedata.com/blog/2015/12/separating-hadoop-compute-and-storage/ && http://www.infostor.com/disk-arrays/hadoop-storage-options-time-to-ditch-das.html My question is, are there any concerns and design considerations when doing this with HDP? Would this essentially be having a series of nodes running purely HDFS / RegionServers for HBASE for the storage implementation, and then a series of nodes for MR and YARN for compute processing? The whole concept of splitting compute and storage is very new to me. I'm used to having all machines be identical and the DAS method that I currently use in my production environment. Also, what would the configuration files look like for this? I assume the DataNode directories parameter would hold the shared storage / SAN endpoint for HDFS? That would mean the SAN would have to be set up with native HDFS volumes. Correct me if I'm wrong. I realize this may go against the past fundamentals people have on how to use this software, but like I said, this is purely for R&D PoC testing.
... View more