Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

Solved Go to solution
Highlighted

Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

New Contributor

I've read https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_cloud-data-access/content/intro.html where it is not recommended to use a cloud storage connector as the filesystem for HDFS. Can someone point out the reasoning for why these object stores can't be set as the defaultFS, which services wouldn't work/have issues, etc.?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

Blob stores do not have the same semantics as file systems. HBase relies on very specific semantics with respect to concurrency and atomic operations which most blob stores (including S3) do not provide.

One example: a move of some "directory" in an S3 bucket is not atomic whereas this is atomic in HDFS.

HBase will 100% not work correctly if you try to configure hbase.rootdir to use S3 via the S3A adapter in Hadoop. EMR has proprietary code in their S3 filesystem access layer, unique from S3A, which does not suffer from this issue somehow.

3 REPLIES 3

Re: Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

Blob stores do not have the same semantics as file systems. HBase relies on very specific semantics with respect to concurrency and atomic operations which most blob stores (including S3) do not provide.

One example: a move of some "directory" in an S3 bucket is not atomic whereas this is atomic in HDFS.

HBase will 100% not work correctly if you try to configure hbase.rootdir to use S3 via the S3A adapter in Hadoop. EMR has proprietary code in their S3 filesystem access layer, unique from S3A, which does not suffer from this issue somehow.

Re: Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

New Contributor

Thanks @Josh Elser for your response. I did notice HBase Master failing to stay up when the cluster was using a Blob Store (Amazon S3 and DellEMC's ECS) as the default FileSystem, which might be because HBase needs HDFS to replicate WAL. Do you know of other services that would not work in such use case?

Re: Why can't Object Stores like Amazon S3 be used as the fs.defaultFS?

I would start by assuming that no service which relies on HDFS can simply use S3 directly. S3Guard can likely bridge the gap for most systems (HBase is an exception), but I cannot tell you the requirements for every service in existence.