Created on 05-16-202211:25 AM - edited on 04-21-202612:56 AM by GrazittiAPI
We all love Hue. However, it isn't always the perfect tool for doing a lot of querying, so you've probably looked toward some 3rd party tools for database connectivity. There are many such tools, but DBeaver has a very capable free version and is among the most popular. Connecting it to a Hive virtual warehouse isn't difficult, but there are a few gotchas that can make it frustrating. So let's conquer those obstacles.
Here is a video walkthrough of the process.
Provision a Hive Virtual Warehouse
Log into a CDP instance
Navigate to Data Warehouse
Enable your Data Warehouse Environment & Database Catalog
Create a New Virtual Warehouse
Give your virtual warehouse a unique name
Select Hive
Select your database catalog from the dropdown SSO may be enabled or disabled, the choice is yours!
Set Availability Zone and User Groups per your requirements or leave as-is
Pick the size (aka decide how much money you want to spend) The remaining options are going to depend on your needs, but the defaults are fine for our purposes.
Click Create. Expect approximately 5 minutes to create your virtual warehouse.
Download the JDBC Jar
From your virtual warehouse tile, click on the kebab icon in the upper right. You'll find all sorts of fun options under there, but we're primarily interested in the Download JDBC Jar option, which will download the Hive jar to your local machine. You can leave it in your Downloads folder, or move it to wherever you like to store your jars. It will be named similar to this:
hive-jdbc-3.1.0-SNAPSHOT-standalone.jar
Copy the JDBC URL
Again, from the kebab icon in your virtual warehouse tile, copy the JDBC URL. This URL has all the necessary information to make the connection, and should be of the form:
Next, we will create a new connection within DBeaver.
Create a new connection in DBeaver, selecting Apache Hive as the database driver.
For the Host, paste the JDBC URL you copied earlier. You'll need to remove the jdbc:hive2:// prefix, since the DBeaver JDBC URL template automatically adds that in for you.
Leave the port empty
Set the Database/Schema to the name of the database you want to connect in as (i.e. default)
Username/Password:
If the warehouse is SSO-enabled, use your SSO credentials.
If the warehouse is not SSO-enabled, use your CDP workload credentials.
Add the Hive jar
Click on the Edit Driver Settings button
DBeaver installs with a driver for Hive, but you may find it to not be fully compatible with your virtual warehouse. Open the Libraries tab and Delete the existing drivers to avoid any conflict.
Click Add File to add the Hive jar you downloaded earlier.
Click OK
Click Test Connection and verify that you can connect. If your virtual warehouse is SSO-enabled, DBeaver will open a browser tab to allow you to authenticate if you aren't already so authenticated.
Click Finish to save the new connection.
Once the connection is created, you can navigate the database, tables, columns, etc, as well as query your data. Congratulations, you did it (and there was much rejoicing).
Tips
If you have connectivity even after successfully testing your connection, doing an Invalidate/Reconnect or a full Disconnect + Reconnect to reset the connection.
If a query seems to take a long time to run, check the status of your virtual warehouse, it is likely that it was stopped and needs to restart to execute the query.
You may need your cloud firewall rules set to allow traffic on port 443 from your IP address.