CDH version : 5.3
Spark Version: 1.2.0
I am trying to run a Hive query from Spark(Spark SQL). I am running it through HiveContext class in Cloudera Spark distribution. It was working fine earlier but now with Sentry ON its throwing a permission exception
org.apache.hadoop.security.AccessControlException: Permission denied: user=kakn, access=READ_EXECUTE, inode="/user/hive/warehouse/rt_freewheel_mastering.db/digital_profile_cluster_in":hive:hive:drwxrwx--t
It seems that the query execution uses Hive CLI and does not go through HiveServer2 which does the translation of invoking users to "hive" which is the only user that has permissions to access table directories under "hive/warehouse/" directory
Is there a way to make the hive query execution to go through HiveServer2?
I do not think that this has been tested and even considered as a use case at the moment.
There is no support for Spark SQL yet in CDH. The HiveContext is part of the SQL side of the product.
I am thus not sure that this can and will work at the moment.
Currently, sentry locks down hive metastore by allowing only specific users to access metastore with hive being one of them . If you want tp grant your application specific account have priviledge to work with metastore as does hive, you can add it using cloudera manager. But be warned that this can ptentially create loophole in security. Thus it is recommended to trade with care.