- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS Heterogeneous Storage - Using AWS S3 as storage tier
- Labels:
-
Apache Hadoop
Created ‎07-08-2016 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is it possible to use AWS S3 as a storage tier within HDFS Heterogeneous Storage? If so, any insight would be greatly appreciated.
Created ‎07-08-2016 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.
You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.
Created ‎07-08-2016 05:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
have you looked at alluxio as a virtual layer over hdfs and s3
Created ‎07-08-2016 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Andrew Watson, HDFS and S3 are distinct file systems. Today there is no way to use S3 as a storage tier within HDFS. You can use the S3A file system which is bundled in the Apache Hadoop distributions to store data in S3. However your application (or administrator) would have to make a conscious decision to use either HDFS or S3A.
You may find HDFS-9806 interesting. This is a proposal from Microsoft to use alternate filesystems like Amazon S3 or Microsoft Azure as storage types within HDFS. Sounds like it exactly addresses your use case.
Created ‎07-11-2016 06:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sanjay Radia recently presented a new concept to be introduced into HDFS (Hadoop 3) called Storage Containers. Storage Containers are an extensibility mechanism that will allow HDFS to manage object storage, such as S3. Watch https://www.youtube.com/watch?v=SdmJHmpvp7E and see "EVOLVING HDFS TO A GENERALIZED DISTRIBUTED STORAGE SUBSYSTEM".
