Support Questions
Find answers, ask questions, and share your expertise

Hive Ranger pluggin REST API calling wrong service name

Highlighted

Hive Ranger pluggin REST API calling wrong service name

Rising Star

Just installed Ranger on 2.4.0 to start experimenting with it. Basically got it working and was starting to play around with making policy changes and seeing how they impact access to prove it works as I anticipate. By default, it created three services names nadcluster_hive, nadcluster_hdfs, and nadcluster_yarn because my cluster name is nadcluster, I suppose. So, I decided to rename these to be more reflective of the service. I renamed nadcluster_hive to jupstats_hive. I started to see errors in the hiveserver2 log like below. Seems the hiveserver2 ranger plugin is still trying to download policies data from a URL like:

http://vmwhaddev01:6080/service/plugins/policies/download/nadcluster_hive

When it should be trying against the newly named service like:

http://vmwhaddev01:6080/service/plugins/policies/download/jupstats_hive

And, in fact, I manually hit the REST end point from a browser with a URL like this and did get a proper response:

http://vmwhaddev01:6080/service/plugins/policies/download/jupstats_hive?lastKnownVersion=4&pluginId=...

Here's the error in the log:

2016-04-22 04:38:17,290 ERROR [Thread-9]: client.RangerAdminRESTClient (RangerAdminRESTClient.java:getServicePoliciesIfUpdated(81)) - Error getting policies. request=http://vmwhaddev01:6080/service/plugins/policies/download/nadcluster_hive?lastKnownVersion=4&pluginId=hiveServer2@vmwhaddev01-nadcluster_hive, response={"httpStatusCode":400,"statusCode":1,"msgDesc":"Serivce:nadcluster_hive not found","messageList":[{"name":"DATA_NOT_FOUND","rbKey":"xa.error.data_not_found","message":"Data not found"}]}, serviceName=nadcluster_hive 
2016-04-22 04:38:17,290 ERROR [Thread-9]: util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(228)) - PolicyRefresher(serviceName=nadcluster_hive): failed to refresh policies. Will continue to use last known version of policies (4)
java.lang.Exception: Serivce:nadcluster_hive not found
at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:83)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicy(PolicyRefresher.java:175)
at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:154) 

And here is when it started working after I renamed the service back to what it was looking for:

2016-04-22 04:38:47,375 INFO  [Thread-9]: util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(218)) - PolicyRefresher(serviceName=nadcluster_hive): found updated version. lastKnownVersion=4; newVersion=18

Questions:

  1. What has to be restarted when you change something in a Ranger policy via that Ranger GUI? I restarted Ranger admin and usersync and hiveserver2, but the wrong service name was still being used in the plugin URL?
  2. Where is the hiveserver2 Ranger plugin learning this URL in the first place

[UPDATE]

Ok, I spent some more time playing around. I figured out that the policy configuration is located here (noting the fact that the service name "nadcluster_hive" is in path and filename:

/etc/ranger/nadcluster_hive/policycache/hiveServer2_nadcluster_hive.json

I performed some testing. With the service name in Ranger UI set to "nadcluster_hive", I made various changes to one of the policies, like adding a new user, enabling/disabling table or column permissions, etc. I tailed the json file above and, every time I made a change and saved, within 30 seconds, I would see the json file be rewritten with the updates. Cool. That seems right.

Next, I renamed the service to nadcluster_hive_1 and repeated the tests. The json file never once changed - as expected because the wrong REST URL is being used. But, I would have expected that maybe a brand new json file with the new service name would have appeared with a path like:

/etc/ranger/nadcluster_hive_1/policycache/hiveServer2_nadcluster_hive_1.json

But, it never did. So, is this expected behavior or a bug?

4 REPLIES 4
Highlighted

Re: Hive Ranger pluggin REST API calling wrong service name

Rising Star
Highlighted

Re: Hive Ranger pluggin REST API calling wrong service name

Hi @Mark Petronic

The service name is the connection between the ranger plugin on the individual node (e.g. namenode, hiveserver,...) and the Ranger UI. For example if you go to your namenode and look into the file /etc/hadoop/conf/ranger-hdfs-security.xml, you will find an entry called ranger.plugin.hdfs.service.name, which corresponds to the name in your Ranger UI/configuration. So if you are changing the service name via the Ranger UI, the ranger plugin on one of the nodes tries to get the policies for service nadcluster_hdfs, but it cant find any service with that name in Ranger (because its called jupstats_hdfs now) and throws an error.

When you enable the Hive or HDFS plugin via Ambari, the service name will always be <clustername>_<service> (e.g. nadcluster_hdfs, nadcluster_hive). You can manually overwrite the value in the XML file, but be aware that every time you restart your HDFS/Hive service it will change the value back to <clustername>_<service>. There is currently no way to change the service name via Ambari.

If you have to change the service name by all costs, there is only one way (which I am not recommending!), disable the Ranger HDFS/Hive plugin in Ambari and enable it manually.

How to manually enable a Ranger Plugin:

1. configure /usr/hdp/<version>/ranger-hdfs-plugin/install.properties

2.Set JAVA_HOME

3.Run /usr/hdp/<version>/ranger-hdfs-plugin/enable-hdfs-plugin.sh

Let me know if that helps

Highlighted

Re: Hive Ranger pluggin REST API calling wrong service name

Rising Star

Thanks @Jonas Straub. That helps clarify the issue a bit but still left wondering. So, the ranger mananger GUI let's you add services - the big fat plus sign icon. Services must have different names so they cannot all be <cluster>_<service>. Am I understanding this right... I am thinking I have two Hive-based services called A and B. They have different users, different policies. So, I am thinking I would create two services under the Hive section in Ranger, one for A and one for B to nicely organize these policies that do not overlap. Then define the users and polices for each. Is that not how services are meant to be used - to organize users/policies? Am I off track here? If not, then are you saying that the current version of Ranger in Ambari is cobbled and can only support ONE service per service type, like one per Hive, one per HDFS, etc, and the names are locked down (as you already indicated)? I can leave the services names as default for now so that it works but just trying to understand the overall configuration concepts of Ranger for future planning.

Re: Hive Ranger pluggin REST API calling wrong service name

You can definitely add multiple services in the Ranger UI, e.g. I recently had to secure multiple SolrCloud-Clusters with one Ranger instance. Since every SolrCloud Cluster was handling their own policies, I had to add one Ranger service for each SolrCloud cluster. I named the Ranger Services solrcloud01, solrcloud02 and solrcloud03 (this was not done through Ambari!).

Usually you have one Ranger service for each Hadoop service in your cluster (e.g. hive, hdfs,...), but you could use the same Ranger instance for different clusters. E.g. you could use one Ranger instance for mycluster_dev, mycluster_int, mycluster_prd (not recommending this!) and manage all policies in one place.

The naming convention <cluster>_<service> is only used when you enable the Ranger plugins through Ambari. When you enable the plugins manually (e.g. for Solr there is no Ambari support at the moment) you can choose your own name.