Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Mask the Hive columns data in Join operator, due to data criticality

Solved Go to solution
Highlighted

How to Mask the Hive columns data in Join operator, due to data criticality

Expert Contributor

Scenario as below

1) Have two tables A and B , having the column Account_Number.

2) If we did the join it should run fine and give result with Account_Number.

3) However this Account_number to be masked in out put ( its critical data ).

4) we have get out put, Account_number should not display all data, its should be like XXXXXX12345.

Solition 1: In Ranger we can disable the Account_number colum name, if we did it join objeration will fail.

Join has to run on Account_number only and this data need to masked ?

Is it possible to do.

Your concerns are appriciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to Mask the Hive columns data in Join operator, due to data criticality

Explorer

@zkfs, mask_show_last_n udf can be used for the scenario you mentioned.

You can run this query from the beeline to get additional details about udf:

DESCRIBE FUNCTION extended mask_show_last_n;

+---------------------------------------------------------------------------------------------------------------------------------+--+

|                                                            tab_name                                                             |

+---------------------------------------------------------------------------------------------------------------------------------+--+

| masks all but last n characters of the value                                                                                    |

| Examples:                                                                                                                       |

|    mask_show_last_n(ccn, 8)                                                                                                     |

|    mask_show_last_n(ccn, 8, 'x', 'x', 'x')                                                                                      |

|  Arguments:                                                                                                                     |

|    mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar)                                   |

|      value      - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR                         |

|      charCount  - number of characters. Default value: 4                                                                        |

|      upperChar  - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X'  |

|      lowerChar  - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x'  |

|      digitChar  - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n'       |

|      otherChar  - character to replace all other characters with. Specify -1 to retain original character. Default value: -1    |

|      numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'                           |

| NULL                                                                                                                            |

+---------------------------------------------------------------------------------------------------------------------------------+--+

Following query returns string with all the characters masked except last 4:

select mask_show_last_n('AAbb1234567', 4, 'X', 'X', 'X');
+--------------+--+

|     _c0      |

+--------------+--+

| XXXXXXX4567  |

+--------------+--+

1 row selected (0.085 seconds

View solution in original post

3 REPLIES 3
Highlighted

Re: How to Mask the Hive columns data in Join operator, due to data criticality

Explorer

You can use data masking udfs instead of disabling the column.

You can find more details from here:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DataMaskingFun...

https://issues.apache.org/jira/browse/HIVE-13568

Highlighted

Re: How to Mask the Hive columns data in Join operator, due to data criticality

Expert Contributor

thanks for your reply.

Have you implemented the UDF for this Masking the columnn, if yes please share sample code..

Thank you

Highlighted

Re: How to Mask the Hive columns data in Join operator, due to data criticality

Explorer

@zkfs, mask_show_last_n udf can be used for the scenario you mentioned.

You can run this query from the beeline to get additional details about udf:

DESCRIBE FUNCTION extended mask_show_last_n;

+---------------------------------------------------------------------------------------------------------------------------------+--+

|                                                            tab_name                                                             |

+---------------------------------------------------------------------------------------------------------------------------------+--+

| masks all but last n characters of the value                                                                                    |

| Examples:                                                                                                                       |

|    mask_show_last_n(ccn, 8)                                                                                                     |

|    mask_show_last_n(ccn, 8, 'x', 'x', 'x')                                                                                      |

|  Arguments:                                                                                                                     |

|    mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar, otherChar, numberChar)                                   |

|      value      - value to mask. Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR                         |

|      charCount  - number of characters. Default value: 4                                                                        |

|      upperChar  - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X'  |

|      lowerChar  - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x'  |

|      digitChar  - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n'       |

|      otherChar  - character to replace all other characters with. Specify -1 to retain original character. Default value: -1    |

|      numberChar - character to replace digits in a number with. Valid values: 0-9. Default value: '1'                           |

| NULL                                                                                                                            |

+---------------------------------------------------------------------------------------------------------------------------------+--+

Following query returns string with all the characters masked except last 4:

select mask_show_last_n('AAbb1234567', 4, 'X', 'X', 'X');
+--------------+--+

|     _c0      |

+--------------+--+

| XXXXXXX4567  |

+--------------+--+

1 row selected (0.085 seconds

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here