- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is EMRFS? Is it a file system in AWS that is different from S3? Is the sqoop import command different when EMRFS is used or do you still refer to the "target" as S3?
- Labels:
-
Apache Sqoop
Created ‎03-03-2017 12:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-03-2017 10:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
EMRFS is an amazon-proprietary replacement for HDFS for cluster storage.
We work on S3A, which is the open source client for reading and writing data in S3: this is not something you can replace HDFS with. In HDP and HDCloud clusters running in EC2, you must use HDFS for the cluster filesystem, with the S3A client to read data from S3 and write it back and the end of a workflow.
We are doing lots of work on S3A performance, much of which is available in HDCloud and HDP2.5.
Note that you can use S3A for remote access to S3 data: between S3 regions and from physical clusters wherever they live. This lets you use S3 as a backup repository of your Hadoop cluster data.
Created ‎03-03-2017 10:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
EMRFS is an amazon-proprietary replacement for HDFS for cluster storage.
We work on S3A, which is the open source client for reading and writing data in S3: this is not something you can replace HDFS with. In HDP and HDCloud clusters running in EC2, you must use HDFS for the cluster filesystem, with the S3A client to read data from S3 and write it back and the end of a workflow.
We are doing lots of work on S3A performance, much of which is available in HDCloud and HDP2.5.
Note that you can use S3A for remote access to S3 data: between S3 regions and from physical clusters wherever they live. This lets you use S3 as a backup repository of your Hadoop cluster data.
Created ‎03-06-2017 12:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for the information. It helps a great deal.
Created ‎03-07-2017 10:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This link may help you in understanding,
http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html
