- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?
Created 08-06-2019 06:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've read https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_cloud-data-access/content/intro.html where it is not recommended to use a cloud storage connector as the filesystem for HDFS. Can someone point out the reasoning for why these object stores can't be set as the defaultFS, which services wouldn't work/have issues, etc.?
Created 08-06-2019 07:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Blob stores do not have the same semantics as file systems. HBase relies on very specific semantics with respect to concurrency and atomic operations which most blob stores (including S3) do not provide.
One example: a move of some "directory" in an S3 bucket is not atomic whereas this is atomic in HDFS.
HBase will 100% not work correctly if you try to configure hbase.rootdir to use S3 via the S3A adapter in Hadoop. EMR has proprietary code in their S3 filesystem access layer, unique from S3A, which does not suffer from this issue somehow.
Created 08-06-2019 07:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Blob stores do not have the same semantics as file systems. HBase relies on very specific semantics with respect to concurrency and atomic operations which most blob stores (including S3) do not provide.
One example: a move of some "directory" in an S3 bucket is not atomic whereas this is atomic in HDFS.
HBase will 100% not work correctly if you try to configure hbase.rootdir to use S3 via the S3A adapter in Hadoop. EMR has proprietary code in their S3 filesystem access layer, unique from S3A, which does not suffer from this issue somehow.
Created 08-07-2019 05:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Josh Elser for your response. I did notice HBase Master failing to stay up when the cluster was using a Blob Store (Amazon S3 and DellEMC's ECS) as the default FileSystem, which might be because HBase needs HDFS to replicate WAL. Do you know of other services that would not work in such use case?
Created 08-07-2019 04:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would start by assuming that no service which relies on HDFS can simply use S3 directly. S3Guard can likely bridge the gap for most systems (HBase is an exception), but I cannot tell you the requirements for every service in existence.