Created on 01-03-2014 09:41 AM - edited 09-16-2022 01:51 AM
Hi All,
I've installed a CDH4.5 Hadoop cluster on Amazon EC2 using the instructions here:
All seems to be working OK, however I can't connect to it from a Windows VM on my laptop using either the Hive or Impala ODBC drivers. I've connected this VM to the Quickstart VM in the past, and connected via the Impala ODBC drivers, but I can't seem to connect to CDH4 running on EC2 at all. Checking one of the EC2 instances, it doesn't even seem if port 10000 (the Hive port) is being used, but Hive is running and in the configuration properties for Hiveserver2 within CM, it says it's using port 10000.
Ports are open within the EC2 security group. Is there something obvious I'm missing here?
Mark
Created 01-07-2014 07:31 PM
I believe in your first post you mention that you are using CM. If you're using CM to manage the cluster then you won't see the hive-server2 service from a command line. You'll have to add the instance and start it from CM. The default settings for HiveServer2 are listed in the configuration, but by default the instance is not added or started. Here is the documentation for adding a role instance. Once you have added the hiveserver2 instance then you can start it and should be able to access it straight away.
Hopefully this will get you going. Please let me know your results.
You can also use the following commands on the quickstart vm or your ec2 setup to verify that port 10000 is in use once you start hiveserver2:
sudo netstat -tulpn | grep 10000
Dave
Created 01-08-2014 09:32 AM
Hi - thanks for the background.
One last question (promise) - if I'm also going to connect to Impala on either the Quickstart VM or an EC2 install (using Cloudera's ODBC drivers for Impala), should I also connect using port 10000, i.e. the Hiveserver2 port? Or should I use 21050?
Reason I ask is that now testing the Impala drivers, 10000 works, but I can't get a connection to work on 21050 (although I seem to remember it worked on that port before...)
Mark
Created 01-08-2014 09:38 AM
You should use port 21050 to connect to Impala, as long as that port hasn't changed in your settings. You should choose no authentication if you do not have security setup on EC2/Quickstart.
Glad to see the HS2 connection is up and running!
Created 01-08-2014 11:15 AM
Thanks. One other issue I hit with Impala is that, on the EC2 install, the port isn't open (21050); this looks like it's because the maximum number of security rules in an AWS security group has been exceeded by the installer. You can add more security groups to an instance, so I'll try that route.
Created 10-28-2015 10:53 AM
Created 10-28-2015 11:03 AM
Created 05-30-2014 05:02 AM
I am encountering issues as well.
Using the cloudera quickstart VM - NAT networking with port forwarding. Have included port 10000.
Managed to connect pentaho kettle to hive. I have an install of both tableau 32 bit and 64 bit.
Have followed the instructions above, starting up hiveserver2 etc, however I still get this error:
Driver Version: V2.5.0.1001
Running connectivity tests...
Attempting connection
Failed to establish connection
SQLSTATE: HY000[Cloudera][Hardy] (34) Error from Hive: Bad version identifier.
TESTS COMPLETED WITH ERROR.
I am running CDH 4.4...is this an issue? Anyone know how to solve this?
Many thanks in advance.
Created 05-30-2014 07:57 AM
I'd guess that your driver and HS2 have a version mismatch. It's not clear from your post exactly what you are using to try to connect to HS2. You said that you got pentaho kettle to connect successfully, and that you have both 32 bit and 64 bit tableau, but not what you used that failed.
You might want to verify the compatibility of your driver with CDH versions (ie check with your driver's vendor), and / or try posting your question in the Hive forums, as this doesn't seem to be an issue with Cloudera Manager.
Created 05-30-2014 08:53 AM
Hi, sorry to hear you're having problems.
CDH 4.4 should work. I have the CDH4.4 VM running on my laptop with NAT and can connect to it.
Are you choosing an authentication mechanism or leaving it as no authentication?
If you are choosing No Authentication, you will need to disable impersonation for HiveServer2 and add the following to the safety valve for hive-site.xml:
<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>
Another option you could try is to leave HiveServer2 as is and choose User Name authentication and supply a user name.
You may also want to try the newest version of the ODBC driver.