Community Articles

Find and share helpful community-sourced technical articles.
avatar
Cloudera Employee

If you made it this far, you probably already know why you're here.  Connecting Dbeaver to an Impala virtual warehouse isn't difficult, but there are a few gotchas that can make it frustrating. So let's conquer those obstacles.

 

Provision an Impala Virtual Warehouse

  1. Log into a CDP instance
  2. Navigate to Data Warehouse
  3. Enable your Data Warehouse Environment & Database Catalog
  4. Create a New Virtual Warehouse
    1. Give your virtual warehouse a unique name
    2. Select Impala
    3. Select your database catalog from the dropdown
    4. SSO may be enabled or disabled, the choice is yours!
    5. Set Availability Zone and User Groups per your requirements or leave as-is
    6. Pick the size (aka decide how much money you want to spend)
    7. The remaining options are going to depend on your needs, but the defaults are fine for our purposes.
    8. Click Create.  Expect approximately 5 minutes to create your virtual warehouse.

 

dbeaver-imp-new.png

 

Download the JDBC Driver

From your virtual warehouse tile, click on the kebab icon in the upper right. You'll find all sorts of fun options under there, but we're primarily interested in the Download JDBC/ODBC Driver option, which will download the Impala jar to your local machine. You can leave it in your Downloads folder, or move it to wherever you like to store your jars. It will be named similar to this:

 

impala_driver_jdbc_odbc.zip

 

 

dbeaver-imp-download-jdbc.png

 

 

You'll need to unzip it, which will create a new folder named impala_driver_jdbc_odbc.  Inside that folder will be two additional folders, we're interested in the JDBC folder, named something like this:

 

ClouderaImpala_JDBC-2.6.23.1028

 

 

Within that folder will be the actual Impala drivers, named for JDBC versions 4.1 and 4.2.  More info on these versions can be found here.  You don't need to unzip these any further.

 

 

ClouderaImpalaJDBC41-2.6.23.1028.zip
ClouderaImpalaJDBC42-2.6.23.1028.zip

 

 

 

 

 

Copy the JDBC URL

Again, from the kebab icon in your virtual warehouse tile, copy the JDBC URL. This URL has all the necessary information to make the connection, and should be of the form:

 

jdbc:impala://coordinator-cnelson2-impala-vdw.dw-se-sandboxx-aws.a465-9q4k.cloudera.site:443/default;AuthMech=12;transportMode=http;httpPath=cliservice;ssl=1;auth=browser

 

 

 

dbeaver-imp-copy-jdbc-url.png

 

Create a New Impala Connection in DBeaver

Next, we will create a new connection within DBeaver.  

  1. Create a new connection in DBeaver, selecting Cloudera Impala as the database driver.
  2. Click Edit Driver Settings to tweak the URL template
    1. Remove the jdbc:impala:// prefix from the URL template
    2. Remove the :{port} from the URL template
    3. The new URL template should look like this:  {host}/{database}
    4. Click OK
  3. For the Host, paste the JDBC URL you copied earlier. 
  4. Leave the port empty
  5. Set the Database/Schema to the name of the database you want to connect in as (i.e. default)
  6. Username/Password:
    • If the warehouse is SSO-enabled, use your SSO credentials.
    • If the warehouse is not SSO-enabled, use your CDP workload credentials.
  7. Add the Impala driver
    1. Click on the Edit Driver Settings button
    2. DBeaver may have installed with a driver for Impala, but you may find it to not be fully compatible with your virtual warehouse. Open the Libraries tab and Delete the existing drivers to avoid any conflict.
    3. Click Add File to add the Impala driver (the 41 or 42 zip file) you downloaded earlier.
    4. Click OK
  8. Click Test Connection and verify that you can connect. If your virtual warehouse is SSO-enabled, DBeaver will open a browser tab to allow you to authenticate if you aren't already so authenticated.  
  9. Click Finish to save the new connection.

dbeaver-imp-url-template.png

 

dbeaver-imp-new-connection.png

 

Once the connection is created, you can navigate the database, tables, columns, etc, as well as query your data. Congratulations, you did it (and there was much rejoicing).

 

Tips

  • If you have connectivity even after successfully testing your connection, doing an Invalidate/Reconnect or a full Disconnect + Reconnect to reset the connection.
  • If a query seems to take a long time to run, check the status of your virtual warehouse, it is likely that it was stopped and needs to restart to execute the query.
  • You may need your cloud firewall rules set to allow traffic on port 443 from your IP address.

 

You might also be interested in my eerily similar article on connecting Dbeaver to a Hive Virtual Warehouse.

1,998 Views