Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using shared storage for Hadoop

Highlighted

Using shared storage for Hadoop

Contributor

I was wondering is there any benchmark result available for using shared-storage for any Hadoop cluster instead of DAS?

I heard some people install HDP on shared storage environment. I know it would be against Hadoop locality design, but I was wondering how would be the affect in practice.

2 REPLIES 2

Re: Using shared storage for Hadoop

Super Guru

@Ali Nazemian You can can definitely use shared storage with hadoop. Performance will vary from differing storage type. For example EMC offers a product line called ECS. During my testing I found it was on par with DAS. the run time configurations were slightly different. Also emc isilon is a another shared storage product with decent performance. Now when you want to compare shared storage with DAS, it is a loaded question. Not all hardware is a like. For example if you have DAS and the spinning disk is at 5k, you may find many of the share storage vendors may beat that type of disk. Also it depends on how many cores and the ram. That is why you don't see many articles on apples to apples comparison. i would say from my experience shared storage is becoming common for hadoop and may be able to handle fair amount of use cases. If you have tight SLAs, I would go to DAS with SSD or 10k disk. its all about the use case

I did some "apples to apples" testing, as close as i could get between shared storage and DAS.

Amazon (shared storage) performance numbers:

https://community.hortonworks.com/content/kbentry/44315/teragen-terasort-and-teravalidate-performanc...

Bigstep (DAS storage) performance numbers

https://community.hortonworks.com/content/kbentry/51648/teragen-terasort-and-teravalidate-performanc...

Both use similar CPU & ram with same number of nodes. Hope it helps you understand possible performance difference

Re: Using shared storage for Hadoop

Contributor

I am investigating the performance drop with similar use cases. Let's suppose you have option for using IBM XIV shared storage server