Support Questions

Find answers, ask questions, and share your expertise

What is EMRFS? Is it a file system in AWS that is different from S3? Is the sqoop import command different when EMRFS is used or do you still refer to the "target" as S3?

avatar
 
1 ACCEPTED SOLUTION

avatar

EMRFS is an amazon-proprietary replacement for HDFS for cluster storage.

We work on S3A, which is the open source client for reading and writing data in S3: this is not something you can replace HDFS with. In HDP and HDCloud clusters running in EC2, you must use HDFS for the cluster filesystem, with the S3A client to read data from S3 and write it back and the end of a workflow.

We are doing lots of work on S3A performance, much of which is available in HDCloud and HDP2.5.

Note that you can use S3A for remote access to S3 data: between S3 regions and from physical clusters wherever they live. This lets you use S3 as a backup repository of your Hadoop cluster data.

View solution in original post

3 REPLIES 3

avatar

EMRFS is an amazon-proprietary replacement for HDFS for cluster storage.

We work on S3A, which is the open source client for reading and writing data in S3: this is not something you can replace HDFS with. In HDP and HDCloud clusters running in EC2, you must use HDFS for the cluster filesystem, with the S3A client to read data from S3 and write it back and the end of a workflow.

We are doing lots of work on S3A performance, much of which is available in HDCloud and HDP2.5.

Note that you can use S3A for remote access to S3 data: between S3 regions and from physical clusters wherever they live. This lets you use S3 as a backup repository of your Hadoop cluster data.

avatar

Thank you very much for the information. It helps a great deal.

avatar
Expert Contributor