I would like to ask you some questions about the usage of Isilon.
First of all, which do you consider that are the best practices of the architecture of a cluster comparing Isilon HDFS with CDH HDFS at the moment? Do you think that Isilon has limitations comparing with CDH HDFS eg security(sentry,kerberos,ldap), performance, etc?
Could I combine the two above solutions eg
1. extending an existed CDH HDFS cluster with Isilon
2. using of Isilon as a backup of an existed CDH HDFS cluster
or I have to create a new cluster which uses only Isilon?
Finally, as I can see Isilon is not supported from CDH 6.2. Which will be the next version of CDH which will support it?
Thank you very much,
Thank you very much for your response.
Comparing CDH Hadoop to Isilon HDFS which of the two do you consider as the best main storage?
There are certain conditions or use cases (eg performance, consistency, scalability, stability, storage space utilization, reliability, etc) at which you consider that the one is better than the other?
as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context.
Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance.
In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities.
In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant.
For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments.
In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS.