Support Questions
Find answers, ask questions, and share your expertise

Does Spark job honor Ranger hive policies?

New Contributor

Hi

I am running a Spark job on ranger enabled HDP cluster. This spark job reads from a hive table and writes to another hive table. What I am seeing is that the ranger hive policies are not being honored.

Is this the expected behavior of Spark job with ranger? Is spark supported with ranger?

1 ACCEPTED SOLUTION

Accepted Solutions

That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly).

That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.

View solution in original post

2 REPLIES 2

That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly).

That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.

View solution in original post

Explorer

The workaround is to use SQLContext (instead of HiveContext) and JDBC to connect to HiveServer2 which will honor ranger's authorization policies.

Following links will give you some idea about how Spark, JDBC and SQLContext works.

http://stackoverflow.com/questions/32195946/method-not-supported-in-spark

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_dataintegration/content/hive-jdbc-odbc-d...