Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ranger security when accessing Hive tables via SparkSQL?

Solved Go to solution

Ranger security when accessing Hive tables via SparkSQL?

Rising Star

How do we manage authorization control over tables within SparkSQL?

Will ranger enforce existing Hive policies when these Hive tables are accessed via SparkSQL? If not, what is the recommended approach.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Ranger security when accessing Hive tables via SparkSQL?

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

9 REPLIES 9

Re: Ranger security when accessing Hive tables via SparkSQL?

TL:DR: SparkSQL today provides table level access control and doesn't provide Hive level (column) level access control.

Spark reads from both Hive Meta store and ORC (or PARQUET) files directly in HDF.

For ORC files, security at HDFS still applies so READ/WRITE is controlled by HDFS ACL or Ranger.

Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this.

Re: Ranger security when accessing Hive tables via SparkSQL?

What steps needs to be performed to get Table level access control via Spark using Ranger.

a) Do we enable the Hive Plugin?

It will be good if there is some notes or docs that shows the steps to enable Table level access control

If using in conjunction with Hue Livy server. Does anything change?

Re: Ranger security when accessing Hive tables via SparkSQL?

"Right now Spark doesn't propagate end user identity to Hive meta store and we are working in the community to enhance this."

I assume Spark runs Hive queries as hive user. Does this mean that Spark has access to all data stored in Hive even though the Ranger plugin is active?

Re: Ranger security when accessing Hive tables via SparkSQL?

The nuance is how you use SparkSQL. If you use Spark-Shell, the identity of user launching the shell is used for Hive access. If you use SparkThrift Server the identity used to access Hive data is the identity used to launch the SparkThriftServer.

Re: Ranger security when accessing Hive tables via SparkSQL?

@vshukla is there any equivalent of hiveserver2 "doAs" in Spark Thrift Server?

Re: Ranger security when accessing Hive tables via SparkSQL?

STS today doesn't support doAs and there is an open ticket for it.

https://issues.apache.org/jira/browse/SPARK-5159

We plan to work in the community to resolve it.

Re: Ranger security when accessing Hive tables via SparkSQL?

Re: Ranger security when accessing Hive tables via SparkSQL?

New Contributor

Hi ,

Can you please provide steps that needs to be performed to get Table level access control via Spark using Ranger ?

Thanks and Regards

Shyam

Highlighted

Re: Ranger security when accessing Hive tables via SparkSQL?

New Contributor

Does any one implemented ranger in spark sql? Any instructions on this is greatly helpful. I did the setup, but there is no impact of policies when a user is running queries. I am missing something there.

Don't have an account?
Coming from Hortonworks? Activate your account here