Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

AWS import instance problem of CDH 5.

avatar
New Contributor

Hello experts,

 

I'm trying to run a test server on AWS since I cannot run CDH 5 quickstart distribution on my laptop.

 

So I'm trying to import it to AWS EC2 via command-line tools. I solved many problems on the way but bumped into this error message. 

 

"This does not appear to be a Stream Optimized VMDK.."

 

1. How to solve that? I downloaded the VMWare version but if you think other versions may solve that (VirtualBox and KVM), I'm ready to give a shot. Or any easy and free way to convert? This is just a personal test and should not be costly for me.

 

2. I just want to have a CDH instance on cloud for test purposes. If Cloudera offers anything free and ready on AWS, that's also welcome.

 

Please help me. I've been dealing with it for the last few days, and it's driving me crazy at this point.

 

BR

 

Serhan

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi to myself and all the relevant viewers,

 

I solved the problem by allowing the ports in AWS. There are many but basically I allowed the following. For the full list refer to the documentation.

 

 

Capture.PNG 

I will list the steps I followed to make Cloudera Quickstart work on AWS.

 

- First, downloaded CDH 5.3 Quickstart VMWare version. 

- Download and setup AWS CLI Tools properly. Of course must have an AWS account and S3 storage.

- Tune up VM resources appropriately. Definitely use 8+ GB Ram and 2+ cores for this version of QuickStart.

- Since the downloaded version of VMWare VMDK is not uploadable to AWS. You must convert it to streamable (OVF) file type. I did that via VM Workstation trial version. In the end, you will have one solid VMDK file, not partial ones. 

- Via AWS CLI command-line tool, use ec2-import-instance command to upload the instance to AWS EC2. It first uploads the file to S3. The region of S3 becomes important since it will create the instance in the same EC2 region with S3. 

- Depending on your Internet upload speed, the upload time will vary. In Turkey it sucks in general so it took around 24 hours.

- If everything is allright so far, you will be able to see your instance created. But you may not connect to it if you did not have any keys created in the S3 region you got. It's all because AWS requires the instances to be created with your keys. This is an import without keys in my situation so it did not recognize it. I worked it around via stopping the instance and creating an image of it. Then terminating the existing one and launching it again from the image. Then it asks for keys; and then you simply create them. If your S3 and EC2 regions are same and you already have keys created in that EC2 region, then it should work though. I just missed that part.

- Then just start the instance and allow the ports above. You can connect to the CM via instance's public IP and port.

 

Also there is an alternative way. Maybe more useful and professional.

 

http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager...

 

 

Tricks, tricks, tricks. It's so hard when you don't know but still exciting to learn from mistakes.

 

BR

 

Serhan

 

View solution in original post

3 REPLIES 3

avatar
Importing the VM image onto AWS would be quite time-consuming. Since you're
interested in free options, I can suggest trying Cloudera Live.

http://cloudera.com/live

Cloudera Live is the fastest and easiest way to get started with Apache
Hadoop and it now includes self­-guided, interactive demos and tutorials.
With a one-­button deployment option, you can spin up a four-­node cluster
of CDH, Cloudera’s open source Hadoop platform, within minutes. This free,
cloud­-based Hadoop environment lets you:

- Learn the basics of Hadoop (and CDH) through pre-­loaded, hands-­on
tutorials
- Plan your Hadoop project using your own datasets
- Explore the latest features in CDH
- Extend the capabilities of Hadoop and CDH through familiar partner
tools, including Tableau, Zoomdata, and Trifacta



Regards,
Gautam Gopalakrishnan

avatar
New Contributor

Hello Gautam,

 

Thanks for the advice.

 

I work for QlikView Turkey and I would like to have a test machine up and running in order to test connections to Impala and Hive via Qlik Products. Is it possible to connect to Cloudera Live via ODBC ?

 

Basically I need to imitate the post below.

 

http://blog.cloudera.com/blog/2015/02/how-to-do-real-time-big-data-discovery-using-cloudera-enterpri...

 

I managed to import the instance to AWS after using OVF conversion via VMWare WorkStation trial by the way. The image is there now. I am able to connect it via SSH.

 

The only problem is that now I cannot make the CDH services running. My AWS instance has 4 cores and 16 GB RAM.

 

I cannot see what the problem is. It's driving me crazy. Please help.

 

BR

 

Serhan

 

 

 

avatar
New Contributor

Hi to myself and all the relevant viewers,

 

I solved the problem by allowing the ports in AWS. There are many but basically I allowed the following. For the full list refer to the documentation.

 

 

Capture.PNG 

I will list the steps I followed to make Cloudera Quickstart work on AWS.

 

- First, downloaded CDH 5.3 Quickstart VMWare version. 

- Download and setup AWS CLI Tools properly. Of course must have an AWS account and S3 storage.

- Tune up VM resources appropriately. Definitely use 8+ GB Ram and 2+ cores for this version of QuickStart.

- Since the downloaded version of VMWare VMDK is not uploadable to AWS. You must convert it to streamable (OVF) file type. I did that via VM Workstation trial version. In the end, you will have one solid VMDK file, not partial ones. 

- Via AWS CLI command-line tool, use ec2-import-instance command to upload the instance to AWS EC2. It first uploads the file to S3. The region of S3 becomes important since it will create the instance in the same EC2 region with S3. 

- Depending on your Internet upload speed, the upload time will vary. In Turkey it sucks in general so it took around 24 hours.

- If everything is allright so far, you will be able to see your instance created. But you may not connect to it if you did not have any keys created in the S3 region you got. It's all because AWS requires the instances to be created with your keys. This is an import without keys in my situation so it did not recognize it. I worked it around via stopping the instance and creating an image of it. Then terminating the existing one and launching it again from the image. Then it asks for keys; and then you simply create them. If your S3 and EC2 regions are same and you already have keys created in that EC2 region, then it should work though. I just missed that part.

- Then just start the instance and allow the ports above. You can connect to the CM via instance's public IP and port.

 

Also there is an alternative way. Maybe more useful and professional.

 

http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager...

 

 

Tricks, tricks, tricks. It's so hard when you don't know but still exciting to learn from mistakes.

 

BR

 

Serhan