Reply
New Contributor
Posts: 2
Registered: ‎06-24-2015

Incompatibility between HiveServer2(required for Sentry) and Spark SQL

CDH version : 5.3

Spark Version: 1.2.0

 

I am trying to run a Hive query from Spark(Spark SQL). I am running it through HiveContext class in Cloudera Spark distribution. It was working fine earlier but now with Sentry ON its throwing a permission exception

 

org.apache.hadoop.security.AccessControlException: Permission denied: user=kakn, access=READ_EXECUTE, inode="/user/hive/warehouse/rt_freewheel_mastering.db/digital_profile_cluster_in":hive:hive:drwxrwx--t

 

It seems that the query execution uses Hive CLI and does not go through HiveServer2 which does the translation of invoking users to "hive" which is the only user that has permissions to access table directories under "hive/warehouse/" directory

 

Is there a way to make the hive query execution to go through HiveServer2?

New Contributor
Posts: 2
Registered: ‎06-24-2015

Re: Incompatibility between HiveServer2(required for Sentry) and Spark SQL

Any suggestions will be appreciated!!
Cloudera Employee
Posts: 318
Registered: ‎01-16-2014

Re: Incompatibility between HiveServer2(required for Sentry) and Spark SQL

I do not think that this has been tested and even considered as a use case at the moment.

There is no support for Spark SQL yet in CDH. The HiveContext is part of the SQL side of the product.

 

I am thus not sure that this can and will work at the moment.

 

Wilfred

Cloudera Employee
Posts: 1
Registered: ‎09-03-2015

Re: Incompatibility between HiveServer2(required for Sentry) and Spark SQL

Currently, sentry locks down hive metastore by allowing only specific users to access metastore with hive being one of them . If you want tp grant your application specific account have priviledge to work with metastore as does hive, you can add it using cloudera manager. But be warned that this can ptentially create loophole in security. Thus it is recommended to trade with care.