- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Custom masking expression using UDF in Apache Ranger
- Labels:
-
Apache Ambari
-
Apache Hive
-
Apache Ranger
Created ‎03-22-2017 03:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using Apache Ranger to specify masking policies to hide sensitive information from Hadoop users. I manage to use custom masking function like mask, mask_last_n, mask_first_n,....
Now I want to create a custom masking rule for the email address column so that all email name will be replaced by "email". For example: john_doe@gmail.com will become email@gmail.com. The custom masking expression using UDF look like this:
replace({col}, substr({col}, 0, instr({col},"@")), "email")
But when I use this custom masking expression, Hive query to that field always cause error:
java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace' java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace' at org.apache.ambari.view.hive2.resources.jobs.JobService.getOne(JobService.java:141) at sun.reflect.GeneratedMethodAccessor1289.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) ...
This Ranger document says when select Custom masking type:
Custom – Specify a custom masked value or expression. Custom masking can use any valid Hive UDF (Hive that returns the same data type as the data type in the column being masked). Masking conditions are evaluated in the order listed in the policy. The condition at the top of the Masking Conditions list is applied first, then the second, then the third, and so on.
As said here , replace is a Hive UDF function which return string similar to the data type of the column. But still it is not possible to use replace.
Can someone help me on how to use Hive UDF function with Custom masking type?
Or if you can help me with masking email address, that would be welcome too.
Created ‎03-23-2017 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above
Created ‎03-23-2017 06:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above
Created ‎10-04-2019 02:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
can you please share the some sample custom masking examples. Otherwise please share any relevant web page or link.
Created ‎03-24-2017 08:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @ssanthosh. It works!
It is wierd that some string functions are allowed (substr, regexp_replace) and some are not (replace_()) in the "custom" masking option. Is there any document about it?
Created ‎03-24-2017 09:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nhan Nguyen Please check if the replace function directly works on hive side.
The replace function is supported as of Hive1.3.0 and 2.1.0
https://issues.apache.org/jira/browse/HIVE-13063
If you think the answer has solved your query, please accept the answer. Thanks
Created ‎04-03-2017 07:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That explains because we have Hive 1.2.1000. Thanks @ssanthosh!
