I've installed a CDH4.5 Hadoop cluster on Amazon EC2 using the instructions here:
All seems to be working OK, however I can't connect to it from a Windows VM on my laptop using either the Hive or Impala ODBC drivers. I've connected this VM to the Quickstart VM in the past, and connected via the Impala ODBC drivers, but I can't seem to connect to CDH4 running on EC2 at all. Checking one of the EC2 instances, it doesn't even seem if port 10000 (the Hive port) is being used, but Hive is running and in the configuration properties for Hiveserver2 within CM, it says it's using port 10000.
Ports are open within the EC2 security group. Is there something obvious I'm missing here?
I believe in your first post you mention that you are using CM. If you're using CM to manage the cluster then you won't see the hive-server2 service from a command line. You'll have to add the instance and start it from CM. The default settings for HiveServer2 are listed in the configuration, but by default the instance is not added or started. Here is the documentation for adding a role instance. Once you have added the hiveserver2 instance then you can start it and should be able to access it straight away.
Hopefully this will get you going. Please let me know your results.
You can also use the following commands on the quickstart vm or your ec2 setup to verify that port 10000 is in use once you start hiveserver2:
sudo netstat -tulpn | grep 10000
No, I can't ping it either from the Windows VM running locally, nor the Mac host that it's running on.
According to the jclouds#cloudera-cdh security group that the install wizard sets up, only the original (CDH) instance can ping the hadoop instances; but a whole range of ports (including 10000) are then also open to 0.0.0.0/0. So this (correct?) would explain why I can't ping the instance, but this shouldn't be an issue?
Sorry, ignore the bit about the Quickstart VM.
I'm running the Cloudera Hadoop cluster on EC2, using the Cloudera Manager 4.5 installer and the standard (free) option. I'm then trying to connect to it from a Windows VM, running on my laptop, using the Cloudera ODBC 2.5.5 Hive drivers; whilst I can connect to CDH on the EC2 setup from a web browser from that VM, I can't get a succesful connection using the ODBC drivers. I'm wondering therefore if there's something else I need to enable, either in the CDH/Hive setup, or in the EC2 setup, to make this work.
Just noticed this other post on the forum - although it's the Quickstart VM, he's had the same problem as me. I also had the sample experience with the Quickstart VM (separately) - could connect to Impala via ODBC, but couldn't connect to Hive.
Unfortunately that post didn't have a solution either...
Yes I configured the DSN etc on Windows. I've also seen the ODBC guide as well (thanks though).
I've been able to connect to Hive before, on a different VM (the Oracle "bigdatalite" one), so I know how Hive ODBC connectivity works. It's just that I can't seem to connect to Hive running on a Hadoop cluster built using CDH4 - the other forum post I referenced had the same issue, so I don't think it's just me. But I expect there's some setting I'm not aware of that's making it happen.
Can you describe what your network configuration is within the cluster? More specifically consider these following questions you should be verifying within your deploy (dont post hostnames or IP's plz).
I believe EC2 nodes are multi-homed. Validate for yourself what the host-naming is resolving to across those interfaces. Look at what forward and reverse lookups are returning as well.
Some of the network configuraitons for components have a "wildcard" name that can be found when you search within a service's configuration settings. This is so the service is listening on "all" interfaces.
For yourself, from both the EC2 cluster nodes you are trying to connect to, and your VM, please evaluate what comes back for this command line in comparison to the naming you are using between your VM and the EC2 environment:
# python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"