Created on 11-30-202312:42 AM - edited on 11-30-202312:49 AM by VidyaSargur
Introduction
This article demonstrates how a Machine Learning(ML) engineer can use the Model Registry Service in Cloudera Machine Learning for cataloging, versioning, and deploying models. The Model Registry can serve as a catalog for models in the DataLake, besides helping provide model lineage information for deployed models for Data and Administrators. For details on how to use the model registry and additional documentation, review the references section below.
Creating the Model Registry for the Data Lake
Model Registry is one of the core building blocks toward MLOps or Devops for Data Science workflows. It is important to note that a single model registry is created for a CDP DataLake and serves as a model catalog for all the models in the DataLake. To create a model registry, click on CML Control Plane and create a new model registry. If there is an existing model registry for that DataLake / CDP environment, you will not be allowed to create a new one. The screen below shows the creation of the Model Registry. Please note that there may be certain differences to creation of Model Registry based on the chosen Cloud Provider (e.g. In Azure you may be asked to provide NFS Details. Refer to Cloudera documentation here for more details based on the form factor chosen)
Once the creation process is initiated, you should be able see the details of the registry creation process by clicking on the registry name and looking at the Event history logs as below:
Setting up access to the Model Registry
As mentioned earlier, setting up Model Registry access differs slightly based on the type of CDP environment. Here, use a RAZ enabled environment ( CDP DataLake with access control mechanisms configured through Apache Ranger). First, copy the machine User Workload User Name in the Model Registry Details page below:Here, non-RAZ enabled development environment is used. As a first step, identify the model user and use the same to set up the access permissions for my user.
This concludes the one time setup needed for the model registry for the DataLake. To understand how to store models in model registry and deploy them in Cloudera Machine Learning service, refer to this article.