Support Questions

Find answers, ask questions, and share your expertise

Spark Security Using LLAP - Spark SQL Query giving error when Ranger policy restricts access

avatar
Expert Contributor

hello - I've implemented Spark security using LLAP, and seeing error in specific scenario

Here is what is done ->

1 - I login to Spark Thrift server using user - spark

2 - created a Ranger policy which specifies that user - 'spark' does not have access to column - storekey in table - factsales

3 - fired query -> select count(1) from factsales;

Error is as shown below.

Pls. note - this seems to be happening when i try to get count of rows.

When i fire query to get values of specific columns, i get expected result (based on whether i have access to the column or not).

------------------------------- QUERY WHERE I GET ERROR (in getting count) --------------------

[alanka01@nwk2-bdp-hadoop-06 ~]$ beeline -u jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.apple.com:10015/default -n spark Connecting to jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.apple.com:10015/default Connected to: Spark SQL (version 1.6.2) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive 0: jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.ap>

select count(1) from factsales;

Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#402L]) +- TungstenExchange SinglePartition, None +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#405L]) +- Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@32b7eb41,Map(table -> default.factsales, url -> jdbc:hive2://nwk2-bdp-hadoop-06.gdcs-qa.apple.com:10500))[] (state=,code=0)

-------------------------QUERIES WITH EXPECTED RESULT -1 -------------------------------------

0: jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.ap>

select saleskey from factsales limit 10;

+-----------+--+ | saleskey | +-----------+--+ | 3343549 | | 2822385 | | 2764012 | | 3289348 | | 2531906 | | 3055870 | | 2530527 | | 2880758 | | 2297049 | | 3356058 | +-----------+--+

-------------------------QUERIES WITH EXPECTED RESULT -2 (since user - spark does not have access to column - storekey) ----------------------------------------------------------------------------------------------------------

0: jdbc:hive2://nwk2-bdp-hadoop-09.gdcs-qa.ap>

select saleskey, storekey from factsales limit 10;

Error: java.io.IOException: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException: Permission denied: user [spark] does not have [SELECT] privilege on [default/factsales/saleskey,storekey] (state=,code=0)

1 ACCEPTED SOLUTION

avatar
Super Guru
@Karan Alang

To do count(*) you need select privilege on all table. You can still do "select count(column name) from <table name>" and that will work but to run count(*) you need to have select permissions for whole table. This is working as expected.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@mqureshi -looping you in, any ideas ?

avatar
Super Guru
@Karan Alang

To do count(*) you need select privilege on all table. You can still do "select count(column name) from <table name>" and that will work but to run count(*) you need to have select permissions for whole table. This is working as expected.

avatar
Expert Contributor

@mqureshi - i guess what you mentioned makes sense, the error message, however, does not indicate the actual issue.