Created 04-22-2016 04:51 AM
Just installed Ranger on 2.4.0 to start experimenting with it. Basically got it working and was starting to play around with making policy changes and seeing how they impact access to prove it works as I anticipate. By default, it created three services names nadcluster_hive, nadcluster_hdfs, and nadcluster_yarn because my cluster name is nadcluster, I suppose. So, I decided to rename these to be more reflective of the service. I renamed nadcluster_hive to jupstats_hive. I started to see errors in the hiveserver2 log like below. Seems the hiveserver2 ranger plugin is still trying to download policies data from a URL like:
http://vmwhaddev01:6080/service/plugins/policies/download/nadcluster_hive
When it should be trying against the newly named service like:
http://vmwhaddev01:6080/service/plugins/policies/download/jupstats_hive
And, in fact, I manually hit the REST end point from a browser with a URL like this and did get a proper response:
http://vmwhaddev01:6080/service/plugins/policies/download/jupstats_hive?lastKnownVersion=4&pluginId=...
Here's the error in the log:
2016-04-22 04:38:17,290 ERROR [Thread-9]: client.RangerAdminRESTClient (RangerAdminRESTClient.java:getServicePoliciesIfUpdated(81)) - Error getting policies. request=http://vmwhaddev01:6080/service/plugins/policies/download/nadcluster_hive?lastKnownVersion=4&pluginId=hiveServer2@vmwhaddev01-nadcluster_hive, response={"httpStatusCode":400,"statusCode":1,"msgDesc":"Serivce:nadcluster_hive not found","messageList":[{"name":"DATA_NOT_FOUND","rbKey":"xa.error.data_not_found","message":"Data not found"}]}, serviceName=nadcluster_hive 2016-04-22 04:38:17,290 ERROR [Thread-9]: util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(228)) - PolicyRefresher(serviceName=nadcluster_hive): failed to refresh policies. Will continue to use last known version of policies (4) java.lang.Exception: Serivce:nadcluster_hive not found at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:83) at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:205) at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicy(PolicyRefresher.java:175) at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:154)
And here is when it started working after I renamed the service back to what it was looking for:
2016-04-22 04:38:47,375 INFO [Thread-9]: util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(218)) - PolicyRefresher(serviceName=nadcluster_hive): found updated version. lastKnownVersion=4; newVersion=18
Questions:
[UPDATE]
Ok, I spent some more time playing around. I figured out that the policy configuration is located here (noting the fact that the service name "nadcluster_hive" is in path and filename:
/etc/ranger/nadcluster_hive/policycache/hiveServer2_nadcluster_hive.json
I performed some testing. With the service name in Ranger UI set to "nadcluster_hive", I made various changes to one of the policies, like adding a new user, enabling/disabling table or column permissions, etc. I tailed the json file above and, every time I made a change and saved, within 30 seconds, I would see the json file be rewritten with the updates. Cool. That seems right.
Next, I renamed the service to nadcluster_hive_1 and repeated the tests. The json file never once changed - as expected because the wrong REST URL is being used. But, I would have expected that maybe a brand new json file with the new service name would have appeared with a path like:
/etc/ranger/nadcluster_hive_1/policycache/hiveServer2_nadcluster_hive_1.json
But, it never did. So, is this expected behavior or a bug?
Created 04-22-2016 04:53 AM
Created 04-22-2016 06:36 AM
The service name is the connection between the ranger plugin on the individual node (e.g. namenode, hiveserver,...) and the Ranger UI. For example if you go to your namenode and look into the file /etc/hadoop/conf/ranger-hdfs-security.xml, you will find an entry called ranger.plugin.hdfs.service.name, which corresponds to the name in your Ranger UI/configuration. So if you are changing the service name via the Ranger UI, the ranger plugin on one of the nodes tries to get the policies for service nadcluster_hdfs, but it cant find any service with that name in Ranger (because its called jupstats_hdfs now) and throws an error.
When you enable the Hive or HDFS plugin via Ambari, the service name will always be <clustername>_<service> (e.g. nadcluster_hdfs, nadcluster_hive). You can manually overwrite the value in the XML file, but be aware that every time you restart your HDFS/Hive service it will change the value back to <clustername>_<service>. There is currently no way to change the service name via Ambari.
If you have to change the service name by all costs, there is only one way (which I am not recommending!), disable the Ranger HDFS/Hive plugin in Ambari and enable it manually.
How to manually enable a Ranger Plugin:
1. configure /usr/hdp/<version>/ranger-hdfs-plugin/install.properties
2.Set JAVA_HOME
3.Run /usr/hdp/<version>/ranger-hdfs-plugin/enable-hdfs-plugin.sh
Let me know if that helps
Created 04-22-2016 06:31 PM
Thanks @Jonas Straub. That helps clarify the issue a bit but still left wondering. So, the ranger mananger GUI let's you add services - the big fat plus sign icon. Services must have different names so they cannot all be <cluster>_<service>. Am I understanding this right... I am thinking I have two Hive-based services called A and B. They have different users, different policies. So, I am thinking I would create two services under the Hive section in Ranger, one for A and one for B to nicely organize these policies that do not overlap. Then define the users and polices for each. Is that not how services are meant to be used - to organize users/policies? Am I off track here? If not, then are you saying that the current version of Ranger in Ambari is cobbled and can only support ONE service per service type, like one per Hive, one per HDFS, etc, and the names are locked down (as you already indicated)? I can leave the services names as default for now so that it works but just trying to understand the overall configuration concepts of Ranger for future planning.
Created 04-25-2016 09:25 AM
You can definitely add multiple services in the Ranger UI, e.g. I recently had to secure multiple SolrCloud-Clusters with one Ranger instance. Since every SolrCloud Cluster was handling their own policies, I had to add one Ranger service for each SolrCloud cluster. I named the Ranger Services solrcloud01, solrcloud02 and solrcloud03 (this was not done through Ambari!).
Usually you have one Ranger service for each Hadoop service in your cluster (e.g. hive, hdfs,...), but you could use the same Ranger instance for different clusters. E.g. you could use one Ranger instance for mycluster_dev, mycluster_int, mycluster_prd (not recommending this!) and manage all policies in one place.
The naming convention <cluster>_<service> is only used when you enable the Ranger plugins through Ambari. When you enable the plugins manually (e.g. for Solr there is no Ambari support at the moment) you can choose your own name.