Support Questions
Find answers, ask questions, and share your expertise

Ranger security when accessing Hive tables via SparkSQL?

Rising Star

How do we manage authorization control over tables within SparkSQL?

Will ranger enforce existing Hive policies when these Hive tables are accessed via SparkSQL? If not, what is the recommended approach.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Ranger security when accessing Hive tables via SparkSQL?

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

View solution in original post

9 REPLIES 9

Re: Ranger security when accessing Hive tables via SparkSQL?

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

View solution in original post

Re: Ranger security when accessing Hive tables via SparkSQL?

What steps needs to be performed to get Table level access control via Spark using Ranger.

a) Do we enable the Hive Plugin?

It will be good if there is some notes or docs that shows the steps to enable Table level access control

If using in conjunction with Hue Livy server. Does anything change?

Re: Ranger security when accessing Hive tables via SparkSQL?

Explorer

"Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this."

I assume Spark runs Hive queries as hive user. Does this mean that Spark has access to all data stored in Hive even though the Ranger plugin is active?

Re: Ranger security when accessing Hive tables via SparkSQL?

The nuance is how you use SparkSQL. If you use Spark-Shell, the identity of user launching the shell is used for Hive access. If you use SparkThrift Server the identity used to access Hive data is the identity used to launch the SparkThriftServer.

Re: Ranger security when accessing Hive tables via SparkSQL?

@vshukla is there any equivalent of hiveserver2 "doAs" in Spark Thrift Server?

Re: Ranger security when accessing Hive tables via SparkSQL?

STS today doesn't support doAs and there is an open ticket for it.

https://issues.apache.org/jira/browse/SPARK-5159

We plan to work in the community to resolve it.

Re: Ranger security when accessing Hive tables via SparkSQL?

Re: Ranger security when accessing Hive tables via SparkSQL?

New Contributor

Hi ,

Can you please provide steps that needs to be performed to get Table level access control via Spark using Ranger ?

Thanks and Regards

Shyam

Re: Ranger security when accessing Hive tables via SparkSQL?

New Contributor

Does any one implemented ranger in spark sql? Any instructions on this is greatly helpful. I did the setup, but there is no impact of policies when a user is running queries. I am missing something there.