Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Will hive caches all the Apache Ranger policies periodically to provide authorization? Or will it hit Ranger for every request needs to be authorized?

avatar
Explorer

I want to know whether Hive syncs the Ranger policies and uses that cache to provide authorization or will it hit Ranger for every request that needs to be authorized

1 ACCEPTED SOLUTION

avatar
Super Collaborator
@Shashank V C

All the plugins that use Ranger as an authorization module will cache local policy and use the same for authorization purpose. Below is an excerpt from Apache Ranger overview:

Plugins are lightweight Java programs which embed within processes of each cluster component. For example, the Apache Ranger plugin for Apache Hive is embedded within Hiveserver2. These plugins pull in policies from a central server and store them locally in a file. When a user request comes through the component, these plugins intercept the request and evaluate it against the security policy. Plugins also collect data from the user request and follow a separate thread to send this data back to the audit server.

Reference: https://hortonworks.com/apache/ranger/#section_2

PS: Please mark the answer if you find it correct 🙂


View solution in original post

6 REPLIES 6

avatar
Super Collaborator
@Shashank V C

All the plugins that use Ranger as an authorization module will cache local policy and use the same for authorization purpose. Below is an excerpt from Apache Ranger overview:

Plugins are lightweight Java programs which embed within processes of each cluster component. For example, the Apache Ranger plugin for Apache Hive is embedded within Hiveserver2. These plugins pull in policies from a central server and store them locally in a file. When a user request comes through the component, these plugins intercept the request and evaluate it against the security policy. Plugins also collect data from the user request and follow a separate thread to send this data back to the audit server.

Reference: https://hortonworks.com/apache/ranger/#section_2

PS: Please mark the answer if you find it correct 🙂


avatar
Explorer

@Chiran RavaniThank you so much, do you know how the policies are pulled from the Ranger? Is it a thrift communication or something else?

avatar
Super Collaborator

It is a REST call to Ranger Admin. Property ranger.plugin.<plugin_name>.policy.rest.url will be used to communicate to Ranger Admin.

eg: ranger.plugin.hive.policy.rest.url for Hive and by default it checks every 30 seconds with Ranger Admin to check if there are any changes with regards to current policy cached, and if so, it downloads the new policy and caches the same.

Default Policy Cache location would be /etc/ranger/<CLUSTER_NAME>_<PLUGIN_COMPONENT_NAME>/policycache on the host where service is runing

eg:- /etc/ranger/hdptest_hive/policycache on Hiveserver2 for my cluster.

avatar
Explorer

Thanks Chiran, one last thing, do you know the .java file in hive where there are doing this rest call? This would help me a lot. Im trying to understanding from the starting point of the rest call to the build up of cache and successive hits to cache. Thus it will help me if I can get the class name where this Rest call happens. I checked PrivilegeSynchonizer.java. But not sure whether thats the right place.

avatar
Super Collaborator

You're Welcome. I would start with RangerAdminRESTClient.java

avatar
New Contributor

Hi Chiran,

I could understand the details above. However i have one question- Which user does the call for ranger.plugin.hive.policy.rest.url for the first time and subsequent attempts?


In my case i have ranger installed in separate server and cluster in separate servers. What is the authentication and user involved by rest client for talking to each other?


I found from ranger audits and could see curl entries for "admin" user and no other further attempts to authenticate?