Member since
09-25-2015
33
Posts
41
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
502 | 04-11-2016 07:43 PM | |
1454 | 01-13-2016 01:27 AM | |
3406 | 12-17-2015 03:29 AM | |
528 | 12-16-2015 11:13 PM | |
295 | 12-08-2015 04:54 PM |
06-02-2016
07:25 PM
2 Kudos
@Carter Everett and @luis marmolejo, the audit implementation has changed HDP 2.3 onwards. Previously the audits were written to local file and copied over to HDFS. From HDP 2.3 onwards, the audits are streamed directly to HDFS. It is written to the local spool folder only if the destination is not available.
... View more
06-02-2016
07:21 PM
1 Kudo
@Ramesh Mani, do we have references to enable this via Ambari? If we manually modify the config file, it will be overwritten next time Ambari restarts Ranger. This is assuming Ambari is used to manage the cluster.
... View more
04-20-2016
04:32 PM
2 Kudos
There is a slight clarification to your assumption. In the case of auditing to HDFS, we use the streaming API. So the audit events are written to HDFS in almost near-realtime. However, we close the file every 24 hours (default, but configurable). So on the HDFS side, you won't be able to read the file till it is closed. If the process (NN) dies, the files are automatically closed. And we create a new file when the NN restarts. For your second question, the following should answer it. If the destination is down, it will write to local file and resume when it is available. If the destination is slower than the rate the audits are generated, then it will spool to local file and throttle the writing. But it will eventually send the audits (local spool size is configurable and dependent on availability of disk space) If you are using components like Hbase, Kafka or Solr which generate way too many audit records, then it will summarize the audits at the source based on unique user+request and send the summarized audits. It uses different queues and spool file for each destination. So If you have destinations which support different speed (e.g. Solr v/s HDFS), you will not lose audits and also the faster destinations will get audit records sooner.
... View more
04-11-2016
10:15 PM
It would depend on which DB vendor you are using and what sort of license you have for your RDBMS. If you don't already have replication at the DB level, you might just for regular export of the DB, e.g mysqldump in MySQL would be a good short term solution for you
... View more
04-11-2016
07:43 PM
1 Kudo
The best option is to just backup your Ranger database. For production, you should also consider setting up DB level replication (active/active or active/passive). This will give you HA and DR.
... View more
03-18-2016
12:15 PM
Don't forget to change the umask. Either 077 or 027
... View more
03-10-2016
06:35 PM
2 Kudos
Grant/revoke will work even with Ranger authorization enabled. In which case, the policies are created in Ranger DB.
... View more
03-10-2016
06:34 PM
1 Kudo
Is there any specific reason you want to turn back to hive sql authorization?
... View more
03-04-2016
01:22 AM
2 Kudos
I generally add "-b ~/cookie.txt -c ~/cookie.txt" also the curl command. And "-i" will give addition information also. curl --negotiate -u : -b ~/cookie.txt -c ~/cookie.txt ...
... View more
02-24-2016
05:52 PM
Great article. One feedback while creating collections, instead of #shards=1, it should be &numShards=1 curl --negotiate -u :"http://horton04.example.com:8983/solr/admin/collections?action=CREATE&name=films&numShards=1"
... View more
02-02-2016
04:42 AM
1 Kudo
@Benson Shih, what does the Audit say? It should have the policyId which gave permission.
... View more
01-17-2016
10:41 PM
When security is enabled, we should also set the HDFS umask to 077 or 027
... View more
01-15-2016
06:52 AM
1 Kudo
Can you make sure that the password for amb_ranger_admin in RangerAdmin->User is the same as in Ambari -> Ranger -> Config -> Ambari Admin User
... View more
01-15-2016
01:12 AM
1 Kudo
Also look into the Ranger Audits from the Ranger Admin. If Ranger is allowing the request, then it will have policy which gave the permission.
... View more
01-13-2016
05:32 PM
You will have to first see where the bottle neck is. Regardless how much you are going to push to the Solr server, it can only index only so many. If you feel transport is the main issue, then you can just create couple of threads and each thread can have it's own solrClient instance. Secondly, you need to batch all your requests and you shouldn't commit from the client side. You should configure auto-commit on the Solr Server side and let it do the final commit. Between Solr doing the buffering v/s you doing the batching, I am not sure what would be the difference.
... View more
01-13-2016
01:27 AM
@sdutta in SolrCloud you should be using CloudSolrClient class. It should take care of everything you mentioned. Gets the active Solr servers from Zookeeper. And when you add the document, it will automatically send it to the server which is hosting the shard for the id, etc. It also keeps track if any Solr server is out of commission and automatically reconfigures itself. CloudSolrClient solrCloudClient = new CloudSolrClient(zkHosts); solrCloudClient.setDefaultCollection(collectionName);
... View more
01-05-2016
01:57 AM
1 Kudo
If you are using SolrJ from your client, then it will connect to zookeeper and automatically do the load balancing for you. If you are going to use SolrJ, then make sure use CloudSolrClient class
... View more
12-17-2015
05:02 AM
1 Kudo
Yes, this is the expected behavior. Ranger Policies are just for the ACL and not for ownership. The right way to do out here is to use Ranger for all the ACLs. You should you want root to access /user/oozie/test1, then from Ranger Admin, you should give "root" the required access to the folder. Ideally, you shouldn't play with owner and group.
... View more
12-17-2015
03:29 AM
2 Kudos
@Kuldeep Kulkarni, how are you setting user admin as administrator? Is the user admin in dfs.cluster.administrators? Do you have access to user "hdfs"?
... View more
12-17-2015
03:19 AM
1 Kudo
I agree, just because we can do it doesn't mean we should do it. From operation point of view, it is better to have one ranger per ambari cluster. This makes management very simple. Also, when it comes to upgrade, it will cause less headache.
... View more
12-17-2015
03:11 AM
There is no native support from Ambari to do this. If you are using Ambari in all env, then the Ambari which is hosting the main Ranger instance is oblivious of the clusters Ranger is supporting. The Ambari which is hosting Ranger will automatically configure Ranger for the components within it's cluster. For the other clusters, you have to go to each component and modify Ranger properties. E.g. you will have to set ranger.plugin.hbase.policy.rest.url property and few others. You also need to add all the services/repo using Ranger Admin UI.
... View more
12-16-2015
11:13 PM
2 Kudos
The primary limitation is wrt to UserSync. If there are multiple clusters, but using the same AD/LDAP, then you can use the same the Ranger instance to manage all of them.
... View more
12-08-2015
05:05 PM
1 Kudo
The upcoming patch for Ranger should support giving Zookeeper quorum used by Ranger as the property.
... View more
12-08-2015
04:54 PM
2 Kudos
In Kafka, topic creation and deletion is still done directly at the ZooKeeper level and doesn't go through Broker. If you are using HDP, then OOTB, only principal "kafka" has permission to do these operations. In future releases, Kafka community will support creation of Topics via Broker. Till that time, there is not much option, but to manage the creation/delete permissions using ZooKeeper ACLs
... View more
12-08-2015
01:45 AM
1 Kudo
I have found issues when you are using the latest MySQL 5.7 with Ranger. To work around, you need to do the following in /etc/my.cnf show_compatibility_56 = on explicit_defaults_for_timestamp
... View more
11-26-2015
06:36 PM
1 Kudo
All of @Andrew Grande points are value. You should also consider the performance impact when you store in HDFS, because Solr pulls indexes from HDFS and keeps it in memory. So you will have to plan your hardware capacity carefully
... View more
11-26-2015
06:33 PM
1 Kudo
For your question specific to storing Ranger Audits, if you envision lot of audit logs will be generated, then you should create multiple shards with enough replication factors for high available and performance. Another recommendation is to store Ranger Audits in both HDFS and Solr. HDFS storage will be for archival and compliance reason. On the Solr side, you can setup maximum retention to delete the audit logs after certain number of days.
... View more
11-06-2015
11:26 PM
12 Kudos
Since you brought up this blog, there are 3 things you need to know. 1. Authentication, 2. User/Group Mapping and 3. Authorization 1. For authentication, there is no alternative for Kerberos. Once your cluster is Kerberized, you can make it easier for certain access path by using AD/LDAP. Example, access to HS2 via AD/LDAP authentication or accessing various services using Knox. 2. Group mapping can be done in 3 ways. One as the blog says, where you lookup AD/LDAP to get the groups for the users. Second is to materialize the AD/LDAP users on the linux server using SSSD, Centrify, etc. Third is to manually create the users and groups in the linux env.All these options are applicable regardless whether you have Kerberos or not. 3. Authorization can be done via Ranger or using the natively supported ACL. Except Storm and Kafka, having Kerberos is not mandatory. Without reliable authentication, authorization and auditing is meaningless. Common use case as yours: User A logs into the system with his AD credentials, HDFS or Hive ACL's kicks in for authorization. You have to qualify "system". Which system are you logging in? Only HS2 and Knox allows you to login via AD/LDAP. If you are planning to do that, then you have to setup a very tight firewall around your Hadoop cluster. Absolutely no one should be able to connect to the NameNode, DataNode or any other service port from outside the cluster, except to the JDBC port of HS2 or Knox port. If you can setup this firewall, then all business users will be secure even if don't kerberize your cluster. However, any user who has shell login/port access to edge node/cluster or able to submit a custom job in the cluster will be able to impersonate anyone. Setting up this firewall is not a trivial thing. Even if you do, there will be users who will need access to the cluster. There should be limited number of such users and these users should be trusted. And you should not let any un-approved job running within the cluster. If the customer is okay with all the "ifs" and comfortable with limited number of super admin users, then yes you can have security without Keberos.
... View more
10-14-2015
05:29 PM
@rgarcia@hortonworks.com If the admin user is synchronized from AD, then you will have to update the Ambari DB and update it. You should probably create a backup admin user with different name with Admin privileges in Ambari. mysql> use ambaricustom mysql> update users set ldap_user=0 where user_name='admin';
... View more