Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark Security Using LLAP - Spark SQL Query giving error when Ranger policy restricts access

avatar
Expert Contributor

hello - I've implemented Spark security using LLAP, and seeing error in specific scenario

Here is what is done ->

1 - I login to Spark Thrift server using user - spark

2 - created a Ranger policy which specifies that user - 'spark' does not have access to column - storekey in table - factsales

3 - fired query -> select count(1) from factsales;

Error is as shown below.

Pls. note - this seems to be happening when i try to get count of rows.

When i fire query to get values of specific columns, i get expected result (based on whether i have access to the column or not).

------------------------------- QUERY WHERE I GET ERROR (in getting count) --------------------

[alanka01@nwk2-bdp-hadoop-06 ~]$ beeline -u jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.apple.com:10015/default -n spark Connecting to jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.apple.com:10015/default Connected to: Spark SQL (version 1.6.2) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive 0: jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.ap>

select count(1) from factsales;

Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#402L]) +- TungstenExchange SinglePartition, None +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#405L]) +- Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@32b7eb41,Map(table -> default.factsales, url -> jdbc:hive2://nwk2-bdp-hadoop-06.gdcs-qa.apple.com:10500))[] (state=,code=0)

-------------------------QUERIES WITH EXPECTED RESULT -1 -------------------------------------

0: jdbc:hive2://nwk2-bdp-hadoop-08.gdcs-qa.ap>

select saleskey from factsales limit 10;

+-----------+--+ | saleskey | +-----------+--+ | 3343549 | | 2822385 | | 2764012 | | 3289348 | | 2531906 | | 3055870 | | 2530527 | | 2880758 | | 2297049 | | 3356058 | +-----------+--+

-------------------------QUERIES WITH EXPECTED RESULT -2 (since user - spark does not have access to column - storekey) ----------------------------------------------------------------------------------------------------------

0: jdbc:hive2://nwk2-bdp-hadoop-09.gdcs-qa.ap>

select saleskey, storekey from factsales limit 10;

Error: java.io.IOException: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException: Permission denied: user [spark] does not have [SELECT] privilege on [default/factsales/saleskey,storekey] (state=,code=0)

1 ACCEPTED SOLUTION

avatar
Super Guru
@Karan Alang

To do count(*) you need select privilege on all table. You can still do "select count(column name) from <table name>" and that will work but to run count(*) you need to have select permissions for whole table. This is working as expected.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@mqureshi -looping you in, any ideas ?

avatar
Super Guru
@Karan Alang

To do count(*) you need select privilege on all table. You can still do "select count(column name) from <table name>" and that will work but to run count(*) you need to have select permissions for whole table. This is working as expected.

avatar
Expert Contributor

@mqureshi - i guess what you mentioned makes sense, the error message, however, does not indicate the actual issue.