Community Articles

Find and share helpful community-sourced technical articles.
Cloudera Employee


How do we quickly gain insight and start working with data in a secure, governed, and scalable environment in the cloud?


This article explains how to achieve this using the Cloudera Data Warehouse platform connected with Apache Superset.


Cloudera Data Warehouse in CDP (Cloudera Data Platform) is an enterprise solution for modern analytics. It's an auto-scaling, highly concurrent, and cost-effective hybrid, a multi-cloud analytics solution that ingests data anywhere, at massive scale, from structured, unstructured, and edge sources.


Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application.


This exercise is performed on a Mac OS. The versions below were tested at the moment of writing this article and may change in the future:







Python 3.7.5 
pip 20.0.2 





After Python/pip installation, install the following packages/versions in Python (we recommend using 
venv before this step):










Apache Superset Configuration

Apache Superset can be installed on your machine or executed in a Docker environment. In this example, we will use the steps provided in Python Virtualenv and the version is:










After setting up the environment, you can access Superset UI with in the following address:







Figure 1: Welcome to Apache Superset

The default username/password is admin/admin.

Cloudera Data Warehouse

If you don't have an Impala Virtual Warehouse (used in this example), you need to create one that will connect to the Database Catalog. This is a very simple step and can be done in minutes. Once you have created a virtual warehouse, if your Database Catalog already has the Tables, Security, and Metadata Definitions to be used, you or the user/application (in our case Apache Superset) can start using the platform. More information can be obtained in this link.


Figure 2: Cloudera Data Warehouse


Here, we will be using the "default-impala" Virtual Warehouse. Since the environment is not running and nobody is using it, it is not consuming any resources. After the Virtual Warehouse creation, you will need to collect the URL to connect to your environment like the following example:


Figure 3: Getting Access URL in Cloudera Data Warehouse

Once you save the access URL, you can configure the Dashboard in Apache Superset.

Configure Cloudera Data Warehouse as Source Database

After the prerequisites, we'll configure the connection in Apache Superset. To start creating the dashboard in Cloudera Data Warehouse, perform the following

  1. Click Source > Database in the top left menu:


                                                    Figure 4: Configuring Source Database
  2. On the top right corner click in the "Add new record" button:


                                                      Figure 5: Add new database button
  3. Now, we need to put the configuration in the following screen:


                                                      Figure 6: Configuring Database


jdbc:impala://;AuthMech=3;transportMode=http;httpPath=cliservice;ssl=1;UID=luizcarrossoni;PWD=PASSWORDTo:impala:// in SQL Lab: CheckedAllow Multi Schema Metadata Fetch: CheckedExtra: Here, we'll pass our Cloudera Data Platform access credentials, there are other ways to do this that are more secure in Apache Superset:{ "metadata_params": {}, "engine_params": { "connect_args": { "user" : "<cdpuser>", "password" : "<password>" } } }

Database Name: Choose a name for example "ClouderaDataPlatform"

SQLAlchemy URI:  We'll use the Access URL that we got in Cloudera Console, we need to customize the URI in order to use impyla and the URL supported by SQLAlchemy:



After providing the config information, click the Test button in the SQLAlchemy URI Field, to see if everything is working properly. If the Virtual Warehouse is in Stopped state, it'll first start the Warehouse and then you'll see that the test was successful:


                                                     Figure 7: Starting Virtual Warehouse


                                                Figure 8: Connection Successful

Now you can save the connection and start creating your dashboards.

Query Data through SQL Lab

You can query the data in the Virtual Warehouse using SQL Lab in Superset:


Figure 9: Query Data in SQL Lab

Note: Since the table is querying the data that supposedly has PII information (ccnumber), the data comes as hashes. This is because we have the following policy in place for the user:



 Figure 10: Masking Policy

Create your Dashboard

To create the Dashboard using Apache Superset in Cloudera Data Platform, do the following:

  1. Add the table as a source in the following menu:


                                                             Figure 11: Add Table Source
  2. Add the ww_customers_data table to start creating the dashboard:


                                                         Figure 12: Create Source Table
  3. Create Charts using the source table that is created and use the charts in a Dashboard:



Tags (2)
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎06-10-2020 10:54 PM
Updated by:
Top Kudoed Authors