- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDCloud with AWS S3
Created ‎01-01-2017 03:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- In the HDCloud installation steps, I could not find any steps to configure S3. How do I configure it?
- The documentation also mentions that it is tuned to work with Amazon S3. What are the major tuning and optimizations that have been bundled with it?
- Does HDCloud on tuned Amazon S3 has substituted HDFS? I mean is data set directly loaded into the memory of nodes from S3 for computations instead of first moving it to HDFS?
Created ‎01-03-2017 05:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.
See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...
In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh
Created ‎01-01-2017 10:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Documentation for using S3 is available in the HDC doc page . For instance, you have pages to use S3 with Hive and Spark. You have also a page on performance tuning.
Regarding your last question, S3 won't replace HDFS. HDFS still the default storage system as explained here:
While Amazon S3 can be used as the source and store for persistent data, it cannot be used as a direct replacement for a cluster-wide filesystem such as HDFS. This is important to know, as the fact that it is accessed with the same APIs can be misleading.
This being said, you can absolutely access data directly from S3 without copying it to HDFS. You have examples on how to do it in the Hive and Spark docs pages that I provided you before.
Created ‎01-03-2017 05:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.
See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...
In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh
