Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Explorer

Hello,

 

I am using the AWS Quick Start to install a cluster. I would like to use EBS storaged attached to my Nodes to create persistent data storage across instance restarts. This is an education cluster and I do not want to leave it running all the time but also I do not want to loose my data in HDFS when I stop the instances.

 

Is this supported?

 

Thank You

 

David Bellion

 

5 REPLIES 5

Re: Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Contributor

Hi David,

 

As of Director 2.1, Director cannot attach EBS volumes to instances and hence cannot setup clusters that use EBS for persistent storage. EBS support is on Director's roadmap, but until then, have you considered storing your data in S3 and having your cluster read/write directly from S3? 

Re: Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Explorer

Hello Vinithra,

 

Thank you for a prompt reply. Can I connect the EBS myself using AWS? I can then just stop and start the nodes as usual and they would have the EBS already attached. 

 

I will alo look into using S3.

 

Best Regards

 

David

Highlighted

Re: Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Contributor

Hi David,

 

My recommendation would be to write a bootstrap script that mounts the EBS volumes. Include this bootstrap script in each instance group, so that the disks are available before the CDH services are initiatlized. If the volumes are mounted after the services are setup, you would have to reconfigure and reinitialize a bunch of CDH services. Do note that you are going off the well-trodden path - and that leveraging S3 for now will probably save you some grief.

 

Re: Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Explorer

Hello

 

Why is this off the beaten track? As I understand S3 is not really suitable for creating an HDFS filesystem platform. I need volumes and filesystems attached to the nodes (as you would do with nodes and BODS in an onsite installation) so that the data persists between cluster restarts. Otherwise the whole concept of using AWS for a cluster is  a poor solution surely? If you have to back up all your data stored on ephemeral storage before you stop the images and then restore it once the images are restarted then thats a very clumsy soultion.  Or am I missunderstanding a fundamental concept?

 

Best Regards

 

Re: Does Cloudera Quick Start for AWS EBS Support use of EBS for perstistant Data Storage?

Contributor

The suggestion here is not to use S3 to populate HDFS - as you have gathered, that would involve moving data back and forth. It isn't necessary for your use-case, though some folks do indeed do just that if their raw data is in S3 and they need to use HDFS for latency requirements.

 

However, the suggestion here is to have your processing jobs (MR, Impala, etc.) run directly off of S3. See this blogpost as an example: http://blog.cloudera.com/blog/2016/08/analytics-and-bi-on-amazon-s3-with-apache-impala-incubating/

 

The "off the beaten track" was referring to using the bootstrap script to mount EBS volumes. This is extending Director's existing functionality, which should come out of the box soon. But until then, I'm not aware of anyone else who has written such a bootstrap script. Hence the warning.

 

Hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here