Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Contributor

For a recent project I was tasked with configuring DBeaver to connect to Phoenix running in an instance of the CDP Datahub. CDP provides a means of creating a Datahub for running an Operational Database (HBase) and using JDBC via Phoenix to query it.

Let's start a Datahub from an Operational Database template.

Provision an Operational Database

  1. Log into a CDP Instance
  2. Select Data Hub Clusters
  3. Select Create Data Hub
  4. Choose the Cloud environment
  5. Choose the template 7.2.2 - Operational Database with SQL
  6. Provide a unique name for the cluster
  7. Click Provision Cluster

Screenshot 2021-01-20 at 10.50.52.png

 

This will start a Datahub cluster running HBase and Phoenix as well as all of the security dependencies provided by SDX. For example, Knox - this will be important when connecting to our instance.

 

Once the cluster has started we need to collect some configuration details. This is best done on the Datahub information page and in the Cloudera Manager Console. We will use the Phoenix thin driver, and this requires a JDBC string of the form.

 

jdbc:phoenix:thin:url=https://<knox endpoint>:443/<cluster name>/cdp-proxy-api/avatica/;serialization=PROTOBUF;authentication=BASIC;avatica_user=<workload username>;avatica_password=<workload password>

 

Once the cluster has started, select Endpoints and make a note of the Phoenix Query Server URI. It should look like this:

 

https://<server>/opdbtest/cdp-proxy-api/avatica/

 

The path details here are important as they provide the proxy and the cluster name format that we need.

The next piece of information we need is the Knox server endpoint. This can be found in the Cloudera Manager Console, under Knox/ Instances. This will replace the <server> part above.

The final component we need is the JAR file containing the Phoenix Thin Client, and that can be sourced from the Cloudera repository here.

https://repository.cloudera.com/

Search for phoenix-queryserver-client

Download the latest release.

Configuring DBeaver

To Install DBeaver, you can download a version from https://dbeaver.io/

In this example, we are using the OSX version. Configuration fields and terms may vary my installed type.

  1. Create. new Apache Phoenix connection to provide a baseline
  2. For the host, use the machine that Knox is running on
  3. For the port, use 443 (default https port)
  4. Provide your workload username and password
  5. Edit Driver Configuration and set

    Class name: org.apache.phoenix.queryserver.client.Driver

    URL Template:
    jdbc:phoenix:thin:url={host}[:{port}]/opdbtest/cdp-proxy-api/avatica/;serialization=PROTOBUF;authentication=BASIC;avatica_user={user};avatica_password={password}

  6. Add the driver JAR using add File and select the JAR downloaded from the Cloudera repository
    Note: Don't search for class as it may automatically discover an invalid Driver class. The Class name configuration will override this. You may need to restart DBeaver, if the class is set incorrectly.
  7. Close the configuration and test the connection.
    Screenshot 2021-01-20 at 11.14.49.png

The URL uses Knox so that access control can be managed centrally. Knox takes the https:// messages and proxies them through to the backend Phoenix services automatically.

A note on Operational Database Experience

Shortly we will be providing an Operational Database Experience. We have significantly streamlined the provisioning of the Datahubs and publish a lot more metadata to help with configuring external clients. For example, the Maven links to the correct clients are provided directly and examples of the JDBC links are presented right in the user interface. 

All of these improvements have been made to help make provisioning new instances easy, and to make connecting to those instances from applications and tools very quick. Our objective is to help you integrate CDP with your applications quickly and efficiently. We welcome your feedback on areas of our platform and documentations, which can be improved to help us with this goal.

Tips

  • If you receive 404 or 401 errors, please check you are connecting to Knox, and that the full https:// url is correct.
  • If you receive errors related to serialisation, make sure you have serialization=PROTOBUF set.

Documentation references

Setting up connections with a CDP Datahub

Connect to PQS through Apache Knox

Connecting to Apache Phoenix Query Server using the JDBC client

 

Setting up connections with CDP Operational Database Experience

Cloudera Operational Database JDBC support

5,313 Views
Comments

Great article!  With the introduction of the OpDB experience (aka COD), there are some worthwhile updates to make here.   Namely, there is a direct download link to the Phoenix Thin client, and also the Phoenix Thin URL is provided for you (no need to determine the knox host).

 

With that comes a complication on the DBeaver side.   Easiest thing to do is to blank out the URL Template, which allows you to directly paste the Thin URL provided by COD, making sure to update your workload password.

 

Alternatively, you can parameterize the URL & secure the password by changing the URL template to look like this, along with unchecking the "No Authentication" box.

 

jdbc:phoenix:thin:url={host};avatica_user={user};avatica_password={password}

 

The complication is that you would need to only copy a portion of the URL up to the user section to get the host.