SQL development tools like DbVisualizer, SQuirreL SQL and DataGrip are popular options for database development. Although these tools don't offer native Hive support they can be easily configured to connect to Hive using JDBC. While connecting these tools to clusters without kerberos is relatively straightforward, the process of connecting them to kerberized clusters can be complex and error prone. This article, in combination with a project I created (
Hive JDBC Uber Jar), aim to simplify and standardize this process. Prerequisites
There are a few key things that must be properly configured before attempting to connect to a kerberized cluster. A full description of these tasks is out of scope for this article, but at a high level, make sure that:
You have downloaded the latest release of my Hive JDBC Uber Jar and placed it somewhere sensible DbVisualizer and/or DataGrip have been successfully installed on your workstation The krb5.conf file on your workstation matches the one on your cluster You have a valid kerberos principal that can access the appropriate services your cluster You can successfully kinit from your workstation against the realm specified in your krb5.conf file DbVisualizer Setup
kinit with an appropriate principal and launch DbVisualizer
Open DbVisualizer preferences ("DbVisualizer" > "Preferences") and add the following properties. DbVisualizer will need to be restarted after applying these changes.
Open the Diver Manager dialog ("Tools" > "Driver Manager...") and hit the "Create a new driver" icon.
Fill in the information as seen below. For the "Driver File Paths" you are pointing to the
hive-jdbc-uber-x.jar that you just downloaded.
Create a new connection ("Database" > "Create Database Connection") and fill out the details based on your cluster as seen below. Please note that you must append the "principal" to the "database" parameter for kerberized connections.
Hit the "Connect" button to test the connection. You should see something like the following in the "Connection Message" text area if the connection is successful.
You are now ready to execute your first query against Hive using DbVisualizer! JetBrains DataGrip Setup
kinit with an appropriate principal and launch DataGrip
Under "File" > "Data Sources...", create a new Driver. Make sure you load the hive-jdbc-uber-x.jar that you just downloaded.
Create a new "Project Data Source" using the new Driver. On the "General" tab, do the following:
Then add the following flags to "VM Options" on the "Advanced" tab.
After creating the "Project Data Source", test the connection. You should see the following:
You are now ready to execute your first query against Hive using DataGrip!
A note about the Hive JDBC Uber Jar When I first created this project the intent was to gather all required Hive dependencies into one single jar file to simplify scenarios like the one described here. This worked very well for connecting to non-kerberized clusters, but when I began to test against kerberized clusters I hit the following exception: java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
This exception is caused because a class named org.apache.hadoop.util.VersionInfo fails to find a file called *-version-info.properties when loaded by some tools. A number of articles on the web suggest resolving this "classpath" issue by copying jars into unnatural places or hacking tool startup scripts. Neither approach sat well with me. Instead, I enhanced the way org.apache.hadoop.util.VersionInfo locates the required properties file and included this updated version of the code in my jar. For more details, check out the README.
... View more
Repo Description This project provides a quick and easy way to build a small, local YUM repo for the Hortonworks Data Platform (HDP) using Vagrant. This fairly closely mirrors the official Hortonworks Documentation. Building and referring to a local YUM repository can be very useful if you have limited bandwidth or an unreliable internet connection. It can also significantly reduce your data plan consumption if you are frequently building Hadoop clusters like I am. This can be easily paired with my Vagrant Generator project by updating that projects "application.properties" file. See related projects: https://community.hortonworks.com/repos/37922/hdp-vagrant-generator.html https://community.hortonworks.com/repos/37882/vagrant-base-box-for-hdp.html Repo Info Github Repo URL https://github.com/timveil/hdp-vagrant-local-repo Github account name timveil Repo name hdp-vagrant-local-repo
... View more
This was the hint I needed. Here is a link to the Vagrantfile I used to test. It includes both the Kerberos command prerequisites and the Ambari Blueprint with related calls. The key, for me, was ensuring this was run before creating submitting the blueprint. # make sure Kerberos packages are installed
yum install krb5-libs krb5-server krb5-workstation -y
# modify Kerberos files
sed -i "s/kerberos.example.com/hdp-common-secure.hdp.local/gI" /etc/krb5.conf
sed -i "s/EXAMPLE.COM/hdp.local/gI" /etc/krb5.conf
sed -i "s/#//g" /etc/krb5.conf
sed -i "s/EXAMPLE.COM/hdp.local/gI" /var/kerberos/krb5kdc/kadm5.acl
# create Kerberos database and add principal. "Bbh2z8HrVx" is my master password
kdb5_util create -s -P Bbh2z8HrVx
kadmin.local -q 'addprinc -pw admin admin/admin' -w Bbh2z8HrVx
# start and enable Kerberos services
systemctl start krb5kdc
systemctl enable krb5kdc
systemctl start kadmin
systemctl enable kadmin
... View more
I'm trying to create an ambari blueprint that will provision a single node cluster using KERBEROS (see https://issues.apache.org/jira/browse/AMBARI-13431 and Ambari Blueprint Example). My confusion is around the "credentials" block in the cluster creation template. All available documentation includes this snippet: "credentials" : [
"alias" : "kdc.admin.credential",
"principal" : "admin/admin",
"key" : "admin",
"type" : "TEMPORARY"
My question is this... Are the principal and key (password) included above intended to describe new credentials (to be created/used by ambari) or existing credentials previously created by calling something like: kadmin.local -q "addprinc admin/admin" It boils down to what KERBEROS configuration is required before using Blueprints to install and configure the cluster. In otherwords, how much of this should be done before creating the cluster via blueprints.
... View more
Repo Description A Vagrantfile generator for Hortonworks Data Platform (HDP). Built using Spring Boot, this application will generate a Vagrantfile file based on the supplied application.properties. This makes it very easy to create purpose built, custom Virtual Box HDP instances that are properly configured for your use case and hardware. Support for HDP 2.5 has just been added! See related projects: https://community.hortonworks.com/repos/37882/vagrant-base-box-for-hdp.html https://community.hortonworks.com/repos/55628/local-yum-repository-for-hdp-using-vagrant.html Repo Info Github Repo URL https://github.com/timveil/hdp-vagrant-generator Github account name timveil Repo name hdp-vagrant-generator
... View more
Repo Description I often use Vagrant as a way to construct purpose built single or multi-node "sandbox" clusters as an alternative to the Hortonworks Sandbox. Properly preparing a linux box for HDP can be time consuming, complicated and error prone. Furthermore, creating and maintaining a Vagrantfile with complex, boiler plate code, is tedious and will increase the time it takes to provision your box. This box simplifies building Vagrant based clusters by handling the complexity for you. See related projects: https://community.hortonworks.com/repos/37922/hdp-vagrant-generator.html https://community.hortonworks.com/repos/55628/local-yum-repository-for-hdp-using-vagrant.html Repo Info Github Repo URL https://github.com/timveil/hdp-vagrant-base Github account name timveil Repo name hdp-vagrant-base
... View more
Thats right. There is a *standalone.jar that ships with hive that should do this but, as you correctly pointed out, does not. This repo works around that problem until it can be properly resolved. In my testing I could not get any of my favorite JDBC clients to work when using the original standalone jar by itself as I was hoping I could. I wanted an easy way to bundle all dependencies into a single jar. I've also made some effort to cleanup the logging dependencies by relying solely on SLF4J and its bindings.
... View more