Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDCloud with AWS S3

avatar
  1. In the HDCloud installation steps, I could not find any steps to configure S3. How do I configure it?
  2. The documentation also mentions that it is tuned to work with Amazon S3. What are the major tuning and optimizations that have been bundled with it?
  3. Does HDCloud on tuned Amazon S3 has substituted HDFS? I mean is data set directly loaded into the memory of nodes from S3 for computations instead of first moving it to HDFS?
1 ACCEPTED SOLUTION

avatar

Hi @Vivek Sharma

When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.

See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...

In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh

View solution in original post

2 REPLIES 2

avatar

HI @Vivek Sharma

Documentation for using S3 is available in the HDC doc page . For instance, you have pages to use S3 with Hive and Spark. You have also a page on performance tuning.

Regarding your last question, S3 won't replace HDFS. HDFS still the default storage system as explained here:

While Amazon S3 can be used as the source and store for persistent data, it cannot be used as a direct replacement for a cluster-wide filesystem such as HDFS. This is important to know, as the fact that it is accessed with the same APIs can be misleading.

This being said, you can absolutely access data directly from S3 without copying it to HDFS. You have examples on how to do it in the Hive and Spark docs pages that I provided you before.

avatar

Hi @Vivek Sharma

When you are creating a cluster, the "Instance Role" parameter allows you to configure S3 access. By default, a new S3 role is created to grant you access to S3 data in your AWS account.

See "Instance Role" in Step 7 at http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/create/i...

In addition, there are ways to authenticate with S3 using keys or tokens: http://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-secur.... @Ram Venkatesh