Created 01-01-2017 03:30 AM
Created 01-03-2017 05:37 PM
When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.
See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...
In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh
Created 01-01-2017 10:23 AM
Documentation for using S3 is available in the HDC doc page . For instance, you have pages to use S3 with Hive and Spark. You have also a page on performance tuning.
Regarding your last question, S3 won't replace HDFS. HDFS still the default storage system as explained here:
While Amazon S3 can be used as the source and store for persistent data, it cannot be used as a direct replacement for a cluster-wide filesystem such as HDFS. This is important to know, as the fact that it is accessed with the same APIs can be misleading.
This being said, you can absolutely access data directly from S3 without copying it to HDFS. You have examples on how to do it in the Hive and Spark docs pages that I provided you before.
Created 01-03-2017 05:37 PM
When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.
See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...
In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh