Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Dump Data into Hadoop with realtime encryption and Hierarchical Masking of some columns of data

Highlighted

Dump Data into Hadoop with realtime encryption and Hierarchical Masking of some columns of data

New Contributor

Say i have a dataset/table(Banking sector) which has the following details.

Name | Mob.no | AccountNo | Address | SSN | Salary....... |and so on.

john | 123456 | 987654321 | abx 123 | 1122 | 28000

I have to dump this into Hadoop. But while dumping i want the `AccountNo` and `SSN` columns to be encrypted, while its getting stored in HDFS.

This is the first part. Now when i am retriving the results,

1. First decryption should happen.

2. After that i want to mask some of the columns.

Say. There are two Persons(CEO, Project Manager) Viewing the results of `john`. Then, CEO should be able to see all the details(columns) after decryption. For Project Manager , the column `AccountNo` and `Salary`

Should be Masked For example:

Name | Mob.no | AccountNo | Address | SSN | Salary....... |and so on.

john | 123456 | 9876xxxxxx | abx 123 | 1122 | xxxxx

IS there any way to achieve this in Hadoop.

1. Encrypting column's of data while Dumping into HDFS.

2. Masking columns based on Hierarchy . Any Leads would be appreciated,Since i am new to hadoop

2 REPLIES 2

Re: Dump Data into Hadoop with realtime encryption and Hierarchical Masking of some columns of data

Guru
@Shubham Ringne

With regards to the column level encryption options, you can certainly try UDF based encryption, reference https://issues.apache.org/jira/browse/HIVE-6329

As far as masking is concerned, you can either achieve this using UDF or with views (using case statements) and of course by restricting access to the table/views.

Re: Dump Data into Hadoop with realtime encryption and Hierarchical Masking of some columns of data

Contributor

@Shubham Ringne Column masking can also be achieved via Ranger (used for security administration and management) if you are using Hive for querying. Please refer https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_security/content/ranger_column_masking_i...

Don't have an account?
Coming from Hortonworks? Activate your account here