Hive is one of the most common used databases on Hadoop, users of Hive are doubling per year due to the amazing enhancements and the addition of Tez and Spark that enabled Hive to by pass the MR era to a an in-memory execution that changed how people are using Hive.
in this blog post, I will show you how to connect squirrel Sql Client to Hive, the concept is similar to any other clients out there as long as you are using the open-source libraries that matches the ones here you should be fine.
Follow the Squirrel documentation and run it on your Mac or PC.
Step 2
Follow the Hortonworks HDP Installation on VritualBox, VMware or Hyper-V and start up the virtual Instance.
Step 3
once you are HDP is up and running, connect it it using SSH as it shows on the console, once you are connected you need to download some JAR files in order to establish the connection.
Step 4
if you are using MacOS, simply while you are connected to you HDP instance search for the following JARs using the command:
root> find / -name JAR_FILE
once you find the file needed, easily copy it using SCP to your laptop/PC
the files you should look for are the following (versions will differ base on which Sandbox you are running but different versions are unlikely to cause a problem)
commons-logging-1.1.3.jar
hive-exec-0.14.0.2.2.4.2-2.jar
hive-jdbc-0.14.0.2.2.4.2-2.jar
hive-service-0.14.0.2.2.4.2-2.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
libthrift-0.9.0.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
hadoop-common-2.6.0.2.2.4.2-2.jar
if you are running windows you might need to install winSCP in order to grab the files from their locations.
Step 5
Once all Jars above are downloaded into your local machine, Open up Squirrell and go to Drivers and Add New Driver.
Name: Hive Driver (could be anything else you want)
Example URL: jdbc:hive2://localhost:10000/default
Class Name: org.apache.hive.jdbc.HiveDriver
go to Extra Class Paths and add all the JARS you downloaded
you may change the port no or IP addresses if you are not running with the defaults.
Step 6
login to you Hadoop Sandbox and verify that HIVESERVER2 is running using:
netstat -anp | grep 10000
if there was nothing running you can hiveserver2 manually
hive> hiveserver2
Step 7
once you verify hiveserver2 is up and running you are ready to test the connection on Squirrel by creating a new Alias as following
you are now ready to connect, once connection is successful you should get a screen like this
Step 8 (Optional)
With your first Hive Query, Squirrel can be buggy and complain about memory and heap size, if this ever occurred, if you are on Mac, right click on the app icon-->show package contents-->open info.plist and add the following snippet