Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Isilon HDFS vs CDH HDFS

avatar
New Contributor

Hello,

 

I would like to ask you some questions about the usage of Isilon.
First of all, which do you consider that are the best practices of the architecture of a cluster comparing Isilon HDFS with CDH HDFS at the moment? Do you think that Isilon has limitations comparing with CDH HDFS eg security(sentry,kerberos,ldap), performance, etc?
Could I combine the two above solutions eg
    1. extending an existed CDH HDFS cluster with Isilon
    2. using of Isilon as a backup of an existed CDH HDFS cluster
or I have to create a new cluster which uses only Isilon?

 

Finally, as I can see Isilon is not supported from CDH 6.2. Which will be the next version of CDH which will support it?

 

Thank you very much,
Lefteris Souvleros

3 REPLIES 3

avatar
Mentor
Our Isilon doc page covers some of your asks, including the differences on security features (as of posting, the Isilon solution did not support ACLs, or transparent encryption), but does support Kerberos Authentication: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_isilon_service.html

> extending an existed CDH HDFS cluster with Isilon

If by extending you mean "merging" the storage under a common namespace, that is not currently possible (in 5.x/6.x).

> using of Isilon as a backup of an existed CDH HDFS cluster

Cloudera Enterprise BDR (Backup and Disaster Recovery) features support replicating to/from Isilon in addition to HDFS, so this is doable: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_pcm_bdr.html#supported_r...

avatar
New Contributor

Thank you very much for your response.

 

Comparing CDH Hadoop to Isilon HDFS which of the two do you consider as the best main storage?
There are certain conditions or use cases (eg performance, consistency, scalability, stability, storage space utilization, reliability, etc) at which you consider that the one is better than the other?

avatar
Expert Contributor

Hi @lsouvleros,

as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context.

 

Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance.

 

In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities.

 

In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant.

 

For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments.

 

In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS. 

 

This is my personal opinion and both EMC and Cloudera might have stronger arguments for their respective storage (e.g. [EMC link]). You can also look for the latest announcement for the blog.

 

Regards, Benjamin