Created 03-22-2017 03:12 PM
Hi,
I am using Apache Ranger to specify masking policies to hide sensitive information from Hadoop users. I manage to use custom masking function like mask, mask_last_n, mask_first_n,....
Now I want to create a custom masking rule for the email address column so that all email name will be replaced by "email". For example: john_doe@gmail.com will become email@gmail.com. The custom masking expression using UDF look like this:
replace({col}, substr({col}, 0, instr({col},"@")), "email")
But when I use this custom masking expression, Hive query to that field always cause error:
java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace' java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace' at org.apache.ambari.view.hive2.resources.jobs.JobService.getOne(JobService.java:141) at sun.reflect.GeneratedMethodAccessor1289.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) ...
This Ranger document says when select Custom masking type:
Custom – Specify a custom masked value or expression. Custom masking can use any valid Hive UDF (Hive that returns the same data type as the data type in the column being masked). Masking conditions are evaluated in the order listed in the policy. The condition at the top of the Masking Conditions list is applied first, then the second, then the third, and so on.
As said here , replace is a Hive UDF function which return string similar to the data type of the column. But still it is not possible to use replace.
Can someone help me on how to use Hive UDF function with Custom masking type?
Or if you can help me with masking email address, that would be welcome too.
Created 03-23-2017 06:22 PM
@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above
Created 03-23-2017 06:22 PM
@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above
Created 10-04-2019 02:18 AM
Hello,
can you please share the some sample custom masking examples. Otherwise please share any relevant web page or link.
Created 03-24-2017 08:06 AM
Thanks @ssanthosh. It works!
It is wierd that some string functions are allowed (substr, regexp_replace) and some are not (replace_()) in the "custom" masking option. Is there any document about it?
Created 03-24-2017 09:03 AM
@Nhan Nguyen Please check if the replace function directly works on hive side.
The replace function is supported as of Hive1.3.0 and 2.1.0
https://issues.apache.org/jira/browse/HIVE-13063
If you think the answer has solved your query, please accept the answer. Thanks
Created 04-03-2017 07:12 AM
That explains because we have Hive 1.2.1000. Thanks @ssanthosh!