Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Custom masking expression using UDF in Apache Ranger

avatar
Explorer

Hi,

I am using Apache Ranger to specify masking policies to hide sensitive information from Hadoop users. I manage to use custom masking function like mask, mask_last_n, mask_first_n,....

Now I want to create a custom masking rule for the email address column so that all email name will be replaced by "email". For example: john_doe@gmail.com will become email@gmail.com. The custom masking expression using UDF look like this:

replace({col}, substr({col}, 0, instr({col},"@")), "email")

But when I use this custom masking expression, Hive query to that field always cause error:

java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace'

java.lang.Exception: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 0:-1 Invalid function 'replace'
	at org.apache.ambari.view.hive2.resources.jobs.JobService.getOne(JobService.java:141)
	at sun.reflect.GeneratedMethodAccessor1289.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
...

This Ranger document says when select Custom masking type:

Custom – Specify a custom masked value or expression. Custom masking can use any valid Hive UDF (Hive that returns the same data type as the data type in the column being masked).

Masking conditions are evaluated in the order listed in the policy. The condition at the top of the Masking Conditions list is applied first, then the second, then the third, and so on.

As said here , replace is a Hive UDF function which return string similar to the data type of the column. But still it is not possible to use replace.

Can someone help me on how to use Hive UDF function with Custom masking type?

Or if you can help me with masking email address, that would be welcome too.

1 ACCEPTED SOLUTION

avatar
Rising Star

@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above

View solution in original post

5 REPLIES 5

avatar
Rising Star

@Nhan Nguyen You can create a masking policy for "email" column with masking option as "custom" with value "concat("email",substr({col},(instr({col},"@"))))". This will give output as mentioned in the example above

avatar
New Contributor

Hello,

 

can you please share the some sample custom masking examples. Otherwise please share any relevant web page or link. 

avatar
Explorer

Thanks @ssanthosh. It works!

It is wierd that some string functions are allowed (substr, regexp_replace) and some are not (replace_()) in the "custom" masking option. Is there any document about it?

avatar
Rising Star

@Nhan Nguyen Please check if the replace function directly works on hive side.

The replace function is supported as of Hive1.3.0 and 2.1.0

https://issues.apache.org/jira/browse/HIVE-13063

If you think the answer has solved your query, please accept the answer. Thanks

avatar
Explorer

That explains because we have Hive 1.2.1000. Thanks @ssanthosh!