Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to attach ebs snapshots to hadoop infrastructure? How can we make snapshots data available to hadoop? Please provide any available documentation.

How to attach ebs snapshots to hadoop infrastructure? How can we make snapshots data available to hadoop? Please provide any available documentation.

Expert Contributor
 
2 REPLIES 2
Highlighted

Re: How to attach ebs snapshots to hadoop infrastructure? How can we make snapshots data available to hadoop? Please provide any available documentation.

@Ram D It appears from Amazon's EBS documentation that you cannot directly access the data stored in an EBS Snapshot. Rather, one must unpack the snapshot data by creating a new EBS volume from the snapshot. After that, you can mount the EBS volume in the usual way and access it as an ordinary local file system.

As you know, a local volume on one server is not readily accessible to a whole cluster of Hadoop servers. You could either copy the data into HDFS, which will distribute the data across the logical volumes being used for datanode storage in the cluster; or you could copy the data into S3, which can be accessed from Hadoop servers within EC2 via the "s3" file system or, in HDP-2.3 or later, "s3a" file system. See https://wiki.apache.org/hadoop/AmazonS3 for an overview, and https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-a... for configuration details. You can also search for "S3" within our Community Connection for useful articles about configuring and accessing S3 from Hadoop.

Depending on the amount of data involved, you may need to be concerned about AWS cost, of both storage and bandwidth used. If so, and if the EBS data being backed up is not itself Hadoop data, then you might consider using S3 as the original repository, rather than EBS with Snapshots. Of course that may not be feasible depending on your use case.

Highlighted

Re: How to attach ebs snapshots to hadoop infrastructure? How can we make snapshots data available to hadoop? Please provide any available documentation.

@Ram D , please review answers to your questions, and if they are acceptable, mark them as "accepted" so the responder can get credit. Otherwise, ask clarifying questions in comments so you can get your question answered. Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here