Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Heterogeneous Storage - Using AWS S3 as storage tier

avatar

Hi,

Is it possible to use AWS S3 as a storage tier within HDFS Heterogeneous Storage? If so, any insight would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar

Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.

You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.

View solution in original post

3 REPLIES 3

avatar
Master Guru

have you looked at alluxio as a virtual layer over hdfs and s3

avatar

Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.

You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.

avatar
Super Guru

Sanjay Radia recently presented a new concept to be introduced into HDFS (Hadoop 3) called Storage Containers. Storage Containers are an extensibility mechanism that will allow HDFS to manage object storage, such as S3. Watch https://www.youtube.com/watch?v=SdmJHmpvp7E and see "EVOLVING HDFS TO A GENERALIZED DISTRIBUTED STORAGE SUBSYSTEM".