Created 07-26-2016 08:12 AM
Hi
I am running a Spark job on ranger enabled HDP cluster. This spark job reads from a hive table and writes to another hive table. What I am seeing is that the ranger hive policies are not being honored.
Is this the expected behavior of Spark job with ranger? Is spark supported with ranger?
Created 07-26-2016 03:23 PM
That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly).
That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.
Created 07-26-2016 03:23 PM
That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly).
That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.
Created 11-17-2016 07:54 AM
The workaround is to use SQLContext (instead of HiveContext) and JDBC to connect to HiveServer2 which will honor ranger's authorization policies.
Following links will give you some idea about how Spark, JDBC and SQLContext works.
http://stackoverflow.com/questions/32195946/method-not-supported-in-spark