Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Querying a Hive Table (via Hiveserver2) with Column Masking enabled via Ranger Hive Plugin returns with an error.


I am running a HDP 2.5.3 cluster with HiveServer2 enabled on an edge node. Ranger is enforcing security. In addition, I've created a policy in ranger that redacts a certain column with pii on select.


Unfortunately, when I execute the following query in a hive client:

select * from <tablename> limit 100;

I get the following error:

[42000]: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.ParseException: line 1:62 rule Identifier failed predicate: {allowQuotedId()}? line 1:74 rule Identifier failed predicate: {allowQuotedId()}? line 1:94 rule Identifier failed predicate: {allowQuotedId()}? line 1:117 rule Identifier failed predicate: {allowQuotedId()}?

If I disable the above policy, the query runs successfully.


Expert Contributor

I was working on something unrelated, but I hit this same error, detailed the issue in Jira, and have proposed a workaround.


The issue is that there is a feature in Hive called the REGEX Column Specification.  IMHO this feature was ill conceived and is not standard SQL.  It should be removed from Hive and this issue is yet another reason why.  That's what I was working on when I hit this issue.


When Hive looks at the table name surrounded by back ticks, it looks at that string and determines that it is a Regex.  When Hive looks at the table name surrounded by quotes, it looks at that string and determines that it is a Table Name. The basic rule it uses is "most anything ASCII surrounded by back ticks is a Regex."


However, when Hive sees the quotes, it sees the string as a table name.  Using quotes (and technically back ticks too, but that's clearly broken) around table names can be allowed/disallowed with a feature in Hive called "".  This feature is enabled in the user's HS2 session by default.  However, when performing masking, it is a multi step process:


  1. The query is parsed by HS2
  2. The masking is applied
  3. The query is parsed again by HS2

The first parsing attempt respects the configuration and allows a query with quotes to be parsed.  However, the masking code does not pass this configuration information to the parser on the second attempt.  And oddly enough, if the configuration information is not passed along, the parser will consider this feature to be disabled.  So, it's actually on the second pass that it fails because the parser rejects the quotes.


For the record, I hit this issue when I removed the Regex feature, because it forced all quoted strings to be considered table names (and subjected to this feature being enabled/disabled) instead of sneaking by as being considered a Regex.  All the masking unit tests failed.