What's New @ Cloudera

Find the latest Cloudera product news

From Silos to Synergy: Unifying Data Warehousing and AI

avatar
Cloudera Employee

Introduction
Data fuels AI. The more data that is available, the more value that can be created by AI tools leveraging that data. Cloudera Data Warehouse is an industry-leading solution for storing and managing data, and Cloudera Machine Learning provides the tools necessary to build, train, and deploy AI models and applications. Now, those solutions work together better than ever before.

Previous versions of Cloudera Data Warehouse and Cloudera Machine Learning required duplicate authentication to integrate. Users first authenticated with Cloudera Machine Learning and then had to re-authenticate with Cloudera Data Warehouse once they were ready to retrieve data from their warehouses. Now, Cloudera Machine Learning seamlessly authenticates the user to Cloudera Data Warehouse without requiring the user to re-enter their credentials. In this blog, we will show how the integration between these two services works so users can leverage all of their data for AI.

Let’s Get Going
The first step is to set up an Impala or Hive virtual warehouse to support JSON Web Token (JWT) authentication. When creating a new Hive/Impala Virtual Warehouse on the Cloudera Data Warehouse management console, expand the “Authorization” section and check the “Enable JWT Authentication” box. Users can configure existing virtual warehouses to use JWT authentication by editing them and selecting “Enable JWT Authentication” under the “Permissions” tab. Making this configuration change will automatically restart the virtual warehouse.

Note: the data lake for the environment must be version 7.2.18 or later.

Create Impala Virtual WarehouseCreate Impala Virtual Warehouse

The next several steps happen in Cloudera Machine Learning.

  1. Create a workspace in Cloudera Machine Learning and make sure the data connections page displays the virtual warehouses created via the Cloudera Data Warehouse management console.
    Cloudera Data Engineering Data ConnectionsCloudera Data Engineering Data Connections
  2. Create a project. Now, you are ready to connect to a virtual warehouse. If the data lake version is 7.2.18 or later and the virtual warehouse has JWT authentication enabled, then the data connection to the virtual warehouse uses JWT authentication. If you wish to use password authentication, then disable JWT authentication on the virtual warehouse in the Cloudera Data Warehouse management console.
  3. Any workloads (sessions/jobs/application/models) created in the project will now connect to the virtual warehouse using a JWT and will not prompt for a username and password.

Peek Into the Inner Workings
Now that we have gone through the steps to get set up, let’s take a look at how authentication works. When the virtual warehouse was set up for JWT authentication, it was configured to trust JWTs issued by the Apache Knox instance running in the environment’s data lake. Those JWTs contain a claim with the username of the authenticated user, which is passed to the virtual warehouse. On the Cloudera Machine Learning side, it used the credentials entered by the user to authenticate with the data lake instance of Apache Knox and obtain a JWT issued from that Knox instance. That JWT is passed to the virtual warehouse when Cloudera Machine Learning connects to it, completing the seamless authentication experience.

Conclusion
With the release of the functionality that eliminates duplicate logins, Cloudera has again demonstrated its commitment to building a fully integrated and frictionless data platform. As the puzzle pieces fit together, our journey is fueled by the seamless flow of data where barriers dissolve and possibilities flourish, shaping a tomorrow filled with promise and potential.