Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Rising Star

Welcome to Part 2 of our harness the hybrid cloud series. In this tutorial, we will learn how to use Data Catalog, Atlas, and Ranger to profile and protect sensitive data in CDP Public Cloud, as depicted below:

Screen Shot 2020-11-13 at 11.21.22 AM.png

Prerequisites

  • Complete Part 1 of the series

Step 1: Launch Data Profiling

CDP Data Catalog comes with data profilers out of the box. You can of course customize them, but in our datasets, we will use the standard data profilers.

Launch Profiler Cluster

  1. Navigate to your CDP Management Console > Data Catalog > Select your environment > Launch Profilers:Screen Shot 2020-11-12 at 1.42.01 PM.png
  2. This will launch a datahub cluster to run the data profiling spark jobs. Wait for the cluster to be built, like in the following screenshot:Screen Shot 2020-11-13 at 11.35.27 AM.png

Verify Profiler execution

Navigate back to your Data Catalog > Profilers > Select your env > Cluster Sensitivity Profiler, and verify that profilers have run successfully:Screen Shot 2020-11-13 at 11.39.18 AM.png


Check profiled data

  1. Go to Search and find the employees Hive table:Screen Shot 2020-11-13 at 11.49.06 AM.png
  2. In the employees table, go to Schema and check the automated tags created:Screen Shot 2020-11-13 at 11.54.47 AM.png

Step 2: Create Tag Based Policy

Navigate to Ranger

  1. In Data Catalog, go to the Policy tab and navigate to a policy to open Ranger:Screen Shot 2020-11-13 at 12.14.10 PM.png
  2. In Ranger, go to Tag based Policies:Screen Shot 2020-11-13 at 12.15.11 PM.png
  3. Open the cm_tags service:
    Screen Shot 2020-11-13 at 12.15.39 PM.png
  4. Navigate to Masking to Add a new policy:Screen Shot 2020-11-13 at 12.20.08 PM.png

Create Masking Rule

Configure the masking rule as depicted in the following screenshot:

Screen Shot 2020-11-13 at 12.27.07 PM.png
  1. Give it a name (for example, mask_creditcard)
  2. Select the dp_credicard tag (dp prefix standing for data profiler)
  3. Select the Group or user for which this policy should apply (here pvidal)
  4. Select Access Type: Hive, Select
  5. Select Masking Option: Redact

Step 3: Verify Security Rule

  1. Go back to your management console Data Warehouse and open Hue for your virtual warehouse:Screen Shot 2020-11-13 at 12.32.51 PM.png
  2. Run the following query and observe masked results:
    select ccnumber from worldwidebank.employees
    Screen Shot 2020-11-13 at 12.39.00 PM.png
  3. As you observed, CDP makes it very easy to secure your data in the cloud. Next step, enrich this data with NiFi! 

 

1,675 Views