Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Querying a Hive Table (via Hiveserver2) with Column Masking enabled via Ranger Hive Plugin returns with an error.

New Contributor

I am running a HDP 2.5.3 cluster with HiveServer2 enabled on an edge node. Ranger is enforcing security. In addition, I've created a policy in ranger that redacts a certain column with pii on select.

12247-y2sye.png

Unfortunately, when I execute the following query in a hive client:

select * from <tablename> limit 100;

I get the following error:

[42000]: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: {allowQuotedId()}? line 1:117 rule Identifier failed predicate: {allowQuotedId()}?

If I disable the above policy, the query runs successfully.

10 REPLIES 10

Rising Star

which user you are using to run the query ? only the allowed users can do the select * because it includes the the column in the policy.

New Contributor

I am logged-in as a user that is in the "select user". So yes, this user has permissions to run the query.

Contributor

@Navendu Garg: I tried your scenario in my test environment and had no issues accessing column with masking option set to 'redact'. On giving select cmd, the values were shown as xxxxx instead of usual values.

Also, based on https://issues.apache.org/jira/browse/HIVE-6013, I inserted values into my column with ` and ', also set column name with ` and ' but the behavior was fine.

What hive client are you using?

Also, can you let me know what kind of value was in your column? Did the column name or any of column values contain any kind of quotes?

I am assuming that you have an access level hive policy granting ur user atleast select privilege to the table and the columns you are accessing in ur select statement, and then a masking policy which masks one of the columns for the same user running the query.

New Contributor
I've tried it on SQL Workbench and Aginity Workbench, and I get errors on both clients.
Column is a varchar(9).

Here is what I discovered. A table with column names that have spaces (e.g. `account year`), this feature fails. When I added an underscore to the column (account_year), the masking started to work.

Contributor

I tried the above scenario and @Navendu Garg's exact issue is described below:

There is a table say 'default.testtable1' with one of the column names containing a space say 'account name'.

[create table testtable1(Name String,`Account number` String,Age int);

...log/webhcat/webhcat.log.2017-02-14:hive.support.quoted.identifiers=column]

A user say 'user1' has a Hive access policy granting all permissions to user1 on default.testtable1, all columns.

There is also a masking policy which masks column 'name' in same table 'default.testtable1' with mask type='redact' for user1.

select * from default.testtable1 throws:

Error: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.ParseException: line 1:41 rule Identifier failed predicate: {allowQuotedId()}?
line 1:56 rule Identifier failed predicate: {allowQuotedId()}?
line 1:129 rule Identifier failed predicate: {allowQuotedId()}?
line 1:144 rule Identifier failed predicate: {allowQuotedId()}? (state=42000,code=40000)

Audit shows only masking policy allowed the txn. There is no deny and no access policy mentioned in the audit. The flow didn't reach the access policies auth.

On disabling the masking policy, the select shows correct results.

NOTE: I tried it with mask type as 'Hash','Partial mask','Nullify' and same result.

@Madhan Neethiraj, Will you pls confirm if this is expected - If any of the column names in a table contain spaces and there is a mask policy on any of that table's columns, then select with fail with the above exception.

Contributor

I think Hive folks should look into this one.

CC: @Eugene Koifman, @Wei Zheng

Expert Contributor

@Navendu Garg, could you share your table definition (show create table T)

Contributor

Hi @Eugene Koifman: I was able to reproduce the exact scenario that Navendu mentioned and table create cmd I used was:

[create table testtable1(Name String,`Account number` String,Age int);

...log/webhcat/webhcat.log.2017-02-14:hive.support.quoted.identifiers=column]

New Contributor

I think its a bug in ranger, where quoted.indentifiers are not being honoured.

To reproduce:

  • set hive.support.sql11.reserved.keywords=false
  • create test-user who has restricted access to some rows in ranger (row filter) say, bangalore filter for service_city column.
  • create a table with one of the column is reserved word like "date"
select date from temp_db.test limit 10; //or
select `date` from temp_db.test limit 10;
  • Resolved query by Ranger:
select date from (SELECT *, BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, ROW__ID FROM `temp_db`.`test`  WHERE lower(service_city)='bangalore')`test` limit 10 
  • Exception:
2017-10-09 13:28:27,517 ERROR [HiveServer2-Handler-Pool: Thread-60373]: ql.Driver (SessionState.java:printError(993)) - FAILED: SemanticException org.apache.hadoop.hive.ql.parse.ParseException: line 1:30 Failed to recognize predicate 'date'. Failed rule: 'identifier' in expression specification org.apache.hadoop.hive.ql.parse.SemanticException: org.apache.hadoop.hive.ql.parse.ParseException: line 1:30 Failed to recognize predicate 'date'. Failed rule: 'identifier' in expression specification at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10321) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10436) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1221) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1215) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:226) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:470) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:457) at sun.reflect.GeneratedMethodAccessor105.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy39.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:313) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.parse.ParseException: line 1:30 Failed to recognize predicate 'date'. Failed rule: 'identifier' in expression specification at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:205) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10319) 


Workaround:

>hive.support.quoted.identifiers=none;

Fire query> select `date` from temp_db.test limit 10;

Note: select * works without any issues even though the table has columns with reserved words.

Contributor

I was working on something unrelated, but I hit this same error, detailed the issue in Jira, and have proposed a workaround.

 

The issue is that there is a feature in Hive called the REGEX Column Specification.  IMHO this feature was ill conceived and is not standard SQL.  It should be removed from Hive and this issue is yet another reason why.  That's what I was working on when I hit this issue.

 

When Hive looks at the table name surrounded by back ticks, it looks at that string and determines that it is a Regex.  When Hive looks at the table name surrounded by quotes, it looks at that string and determines that it is a Table Name. The basic rule it uses is "most anything ASCII surrounded by back ticks is a Regex."

 

However, when Hive sees the quotes, it sees the string as a table name.  Using quotes (and technically back ticks too, but that's clearly broken) around table names can be allowed/disallowed with a feature in Hive called "hive.support.quoted.identifiers".  This feature is enabled in the user's HS2 session by default.  However, when performing masking, it is a multi step process:

 

  1. The query is parsed by HS2
  2. The masking is applied
  3. The query is parsed again by HS2

The first parsing attempt respects the hive.support.quoted.identifiers configuration and allows a query with quotes to be parsed.  However, the masking code does not pass this configuration information to the parser on the second attempt.  And oddly enough, if the configuration information is not passed along, the parser will consider this feature to be disabled.  So, it's actually on the second pass that it fails because the parser rejects the quotes.

 

For the record, I hit this issue when I removed the Regex feature, because it forced all quoted strings to be considered table names (and subjected to this feature being enabled/disabled) instead of sneaking by as being considered a Regex.  All the masking unit tests failed.

 

https://issues.apache.org/jira/browse/HIVE-23182

https://issues.apache.org/jira/browse/HIVE-23176

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.