Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Solved Go to solution

HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Hi,

Is it possible to use AWS S3 as a storage tier within HDFS Heterogeneous Storage? If so, any insight would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.

You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.

3 REPLIES 3

Re: HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Super Guru

have you looked at alluxio as a virtual layer over hdfs and s3

Re: HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.

You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.

Re: HDFS Heterogeneous Storage - Using AWS S3 as storage tier

Sanjay Radia recently presented a new concept to be introduced into HDFS (Hadoop 3) called Storage Containers. Storage Containers are an extensibility mechanism that will allow HDFS to manage object storage, such as S3. Watch https://www.youtube.com/watch?v=SdmJHmpvp7E and see "EVOLVING HDFS TO A GENERALIZED DISTRIBUTED STORAGE SUBSYSTEM".

Don't have an account?
Coming from Hortonworks? Activate your account here