Created on 12-21-201608:18 PM - edited 08-17-201907:10 AM
SQL development tools like DbVisualizer, SQuirreL SQL and DataGrip are popular options for database development. Although these tools don't offer native Hive support they can be easily configured to connect to Hive using JDBC. While connecting these tools to clusters without kerberos is relatively straightforward, the process of connecting them to kerberized clusters can be complex and error prone. This article, in combination with a project I created (
Hive JDBC Uber Jar), aim to simplify and standardize this process.
There are a few key things that must be properly configured before attempting to connect to a kerberized cluster. A full description of these tasks is out of scope for this article, but at a high level, make sure that:
You have downloaded the latest release of my Hive JDBC Uber Jar and placed it somewhere sensible
DbVisualizer and/or DataGrip have been successfully installed on your workstation
The krb5.conf file on your workstation matches the one on your cluster
You have a valid kerberos principal that can access the appropriate services your cluster
You can successfully kinit from your workstation against the realm specified in your krb5.conf file
kinit with an appropriate principal and launch DbVisualizer
Open DbVisualizer preferences ("DbVisualizer" > "Preferences") and add the following properties. DbVisualizer will need to be restarted after applying these changes.
Open the Diver Manager dialog ("Tools" > "Driver Manager...") and hit the "Create a new driver" icon.
Fill in the information as seen below. For the "Driver File Paths" you are pointing to the
hive-jdbc-uber-x.jar that you just downloaded.
Create a new connection ("Database" > "Create Database Connection") and fill out the details based on your cluster as seen below. Please note that you must append the "principal" to the "database" parameter for kerberized connections.
Hit the "Connect" button to test the connection. You should see something like the following in the "Connection Message" text area if the connection is successful.
You are now ready to execute your first query against Hive using DbVisualizer!
JetBrains DataGrip Setup
kinit with an appropriate principal and launch DataGrip
Under "File" > "Data Sources...", create a new Driver. Make sure you load the hive-jdbc-uber-x.jar that you just downloaded.
After creating the "Project Data Source", test the connection. You should see the following:
You are now ready to execute your first query against Hive using DataGrip!
A note about the Hive JDBC Uber Jar
When I first created this project the intent was to gather all required Hive dependencies into one single jar file to simplify scenarios like the one described here. This worked very well for connecting to non-kerberized clusters, but when I began to test against kerberized clusters I hit the following exception:
java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
This exception is caused because a class named org.apache.hadoop.util.VersionInfo fails to find a file called *-version-info.properties when loaded by some tools. A number of articles on the web suggest resolving this "classpath" issue by copying jars into unnatural places or hacking tool startup scripts. Neither approach sat well with me. Instead, I enhanced the way org.apache.hadoop.util.VersionInfo locates the required properties file and included this updated version of the code in my jar. For more details, check out the README.