Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Cloudera Employee

We all love Hue. However, it isn't always the perfect tool for doing a lot of querying, so you've probably looked toward some 3rd party tools for database connectivity. There are many such tools, but DBeaver has a very capable free version and is among the most popular. Connecting it to a Hive virtual warehouse isn't difficult, but there are a few gotchas that can make it frustrating. So let's conquer those obstacles.

Here is a video walkthrough of the process.

Provision a Hive Virtual Warehouse

  1. Log into a CDP instance
  2. Navigate to Data Warehouse
  3. Enable your Data Warehouse Environment & Database Catalog
  4. Create a New Virtual Warehouse
    1. Give your virtual warehouse a unique name
    2. Select Hive
    3. Select your database catalog from the dropdown
      SSO may be enabled or disabled, the choice is yours!
    4. Set Availability Zone and User Groups per your requirements or leave as-is
    5. Pick the size (aka decide how much money you want to spend)
      The remaining options are going to depend on your needs, but the defaults are fine for our purposes.
    6. Click Create.  Expect approximately 5 minutes to create your virtual warehouse.

new-hive-vdw.png

Download the JDBC Jar

From your virtual warehouse tile, click on the kebab icon in the upper right. You'll find all sorts of fun options under there, but we're primarily interested in the Download JDBC Jar option, which will download the Hive jar to your local machine. You can leave it in your Downloads folder, or move it to wherever you like to store your jars. It will be named similar to this:

 

hive-jdbc-3.1.0-SNAPSHOT-standalone.jar

 

hive-download-jar.png

Copy the JDBC URL

Again, from the kebab icon in your virtual warehouse tile, copy the JDBC URL. This URL has all the necessary information to make the connection, and should be of the form:

 

jdbc:hive2://hs2-<virtual warehouse name>.<cdp environment name>.a465-9q4k.cloudera.site/default;transportMode=http;httpPath=cliservice;socketTimeout=60;ssl=true;retries=3;

 

hive-copy-jdbc-url.png

Create a New Hive Connection in DBeaver

Next, we will create a new connection within DBeaver.  

  1. Create a new connection in DBeaver, selecting Apache Hive as the database driver.
  2. For the Host, paste the JDBC URL you copied earlier. You'll need to remove the jdbc:hive2:// prefix, since the DBeaver JDBC URL template automatically adds that in for you.
  3. Leave the port empty
  4. Set the Database/Schema to the name of the database you want to connect in as (i.e. default)
  5. Username/Password:
    • If the warehouse is SSO-enabled, use your SSO credentials.
    • If the warehouse is not SSO-enabled, use your CDP workload credentials.
  6. Add the Hive jar
    1. Click on the Edit Driver Settings button
    2. DBeaver installs with a driver for Hive, but you may find it to not be fully compatible with your virtual warehouse. Open the Libraries tab and Delete the existing drivers to avoid any conflict.
    3. Click Add File to add the Hive jar you downloaded earlier.
    4. Click OK
  7. Click Test Connection and verify that you can connect. If your virtual warehouse is SSO-enabled, DBeaver will open a browser tab to allow you to authenticate if you aren't already so authenticated.  
  8. Click Finish to save the new connection.

hive-new-connection.png

Once the connection is created, you can navigate the database, tables, columns, etc, as well as query your data. Congratulations, you did it (and there was much rejoicing).

 

Tips

  • If you have connectivity even after successfully testing your connection, doing an Invalidate/Reconnect or a full Disconnect + Reconnect to reset the connection.
  • If a query seems to take a long time to run, check the status of your virtual warehouse, it is likely that it was stopped and needs to restart to execute the query.
  • You may need your cloud firewall rules set to allow traffic on port 443 from your IP address.

Credit to @Royles (Chris Royles) for his similar article about connecting DBeaver to Phoenix/OpDB.

4,384 Views
Comments

Nice one sir!