Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ranger security when accessing Hive tables via SparkSQL?

avatar
Expert Contributor

How do we manage authorization control over tables within SparkSQL?

Will ranger enforce existing Hive policies when these Hive tables are accessed via SparkSQL? If not, what is the recommended approach.

1 ACCEPTED SOLUTION

avatar

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

View solution in original post

9 REPLIES 9

avatar

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

avatar

What steps needs to be performed to get Table level access control via Spark using Ranger.

a) Do we enable the Hive Plugin?

It will be good if there is some notes or docs that shows the steps to enable Table level access control

If using in conjunction with Hue Livy server. Does anything change?

avatar
Contributor

"Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this."

I assume Spark runs Hive queries as hive user. Does this mean that Spark has access to all data stored in Hive even though the Ranger plugin is active?

avatar

The nuance is how you use SparkSQL. If you use Spark-Shell, the identity of user launching the shell is used for Hive access. If you use SparkThrift Server the identity used to access Hive data is the identity used to launch the SparkThriftServer.

avatar

@vshukla is there any equivalent of hiveserver2 "doAs" in Spark Thrift Server?

avatar

STS today doesn't support doAs and there is an open ticket for it.

https://issues.apache.org/jira/browse/SPARK-5159

We plan to work in the community to resolve it.

avatar

avatar
New Contributor

Hi ,

Can you please provide steps that needs to be performed to get Table level access control via Spark using Ranger ?

Thanks and Regards

Shyam

avatar
New Contributor

Does any one implemented ranger in spark sql? Any instructions on this is greatly helpful. I did the setup, but there is no impact of policies when a user is running queries. I am missing something there.