Member since
09-15-2015
457
Posts
507
Kudos Received
90
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15662 | 11-01-2016 08:16 AM | |
11080 | 11-01-2016 07:45 AM | |
8559 | 10-25-2016 09:50 AM | |
1918 | 10-21-2016 03:50 AM | |
3822 | 10-14-2016 03:12 PM |
12-01-2015
08:58 AM
Awesome project 🙂
... View more
11-30-2015
08:31 AM
@Ali Bajwa saw the same error with spark 1.4.1, it did work with 1.5.1, so it looks like its fixed.
... View more
11-27-2015
08:18 AM
thanks @Neeraj Sabharwal, great article. This helped a lot with my decision.
... View more
11-27-2015
08:07 AM
thanks @Andrew Grande very good points, I totally forgot about these. I mean for Ranger Audit Logs these points don't matter that much (logs also saved to HDFS, Solr log retention < 30days, etc.), but for other projects they do! I really like the point about scalability and reliability, I dont have to plan the storage for solr separately or reserve space on my nodes, I can scale with HDFS 🙂
... View more
11-27-2015
07:51 AM
@bdurai thanks. I have already set up Solr and HDFS Ranger Audit Logs. Solr Logs will automatically be deleted after 30 days (Document expiration). Currently I am using a factor 2 replication as well as 2 shards, but I might be able to increase this even more.
... View more
11-26-2015
09:12 AM
1 Kudo
According to the docs, Solr relies heavily on fast bulk reads and writes during index updates. Lets say I want to index thousands of documents (word, pdf, html, ...) or I want to store my Ranger audit logs in my SolrCloud. Is it a good idea to use HDFS as index and data store or should I go with a local non-hdfs data directory? Ranger Audit Logs documentation mentions "1 TB free space in the volume where Solr will store the index data.", which sounds like non-hdfs?!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Ranger
-
Apache Solr
11-25-2015
10:54 PM
7 Kudos
@Olivier Renault I don't think we have a separate audit tool or recording of the changes available, however a short Python script should solve this problem. I just created a short example (quick and dirty solution, needs some tweaking! :P), take a look at this https://github.com/mr-jstraub/ambari-audit-config The repo contains an audit.py script that you can use as follows: Example (audit hive-site to shell): python audit.py --target horton01.myhost.com:8080 --cluster bigdata --user admin --config hive-site Example (audit hive-site to hive-site_audit.log) python audit.py --target horton01.myhost.com:8080 --cluster bigdata --user admin --config hive-site --output hive-site_audit.log Result: hive-site: version 1 - ADDED - javax.jdo.option.ConnectionDriverName - com.mysql.jdbc.Driver
hive-site: version 1 - ADDED - hive.fetch.task.aggr - false
hive-site: version 1 - ADDED - hive.execution.engine - tez
hive-site: version 1 - ADDED - hive.tez.java.opts - -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
hive-site: version 1 - ADDED - hive.vectorized.groupby.maxentries - 100000
hive-site: version 1 - ADDED - hive.server2.table.type.mapping - CLASSIC
...
...
...
hive-site: version 1 - ADDED - hive.compactor.check.interval - 300L
hive-site: version 1 - ADDED - hive.compactor.delta.pct.threshold - 0.1f
hive-site: version 2 - CHANGED - javax.jdo.option.ConnectionURL - jdbc:mysql://horton03.myhost.com/hive?createDatabaseIfNotExist=true => jdbc:mysql://horton03.myhost.com:3306/hive?createDatabaseIfNotExist=true
hive-site: version 2 - CHANGED - hive.zookeeper.quorum - horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 => horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181
hive-site: version 2 - CHANGED - hive.cluster.delegation.token.store.zookeeper.connectString - horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 => horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181
hive-site: version 3 - CHANGED - javax.jdo.option.ConnectionURL - jdbc:mysql://horton03.myhost.com:3306/hive?createDatabaseIfNotExist=true => jdbc:mysql://horton03.myhost.com/hive?createDatabaseIfNotExist=true
hive-site: version 4 - ADDED - atlas.cluster.name - default
hive-site: version 4 - CHANGED - hive.exec.post.hooks - org.apache.hadoop.hive.ql.hooks.ATSHook => org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.atlas.hive.hook.HiveHook
hive-site: version 4 - CHANGED - hive.metastore.sasl.enabled - false => true
hive-site: version 4 - CHANGED - hive.server2.authentication.spnego.principal - /etc/security/keytabs/spnego.service.keytab => HTTP/_HOST@EXAMPLE.COM
hive-site: version 4 - CHANGED - hive.server2.authentication.spnego.keytab - HTTP/_HOST@EXAMPLE.COM => /etc/security/keytabs/spnego.service.keytab
hive-site: version 4 - ADDED - hive.server2.authentication.kerberos.keytab - /etc/security/keytabs/hive.service.keytab
hive-site: version 4 - CHANGED - hive.zookeeper.quorum - horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 => horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181
hive-site: version 4 - ADDED - hive.server2.authentication.kerberos.principal - hive/_HOST@EXAMPLE.COM
hive-site: version 4 - ADDED - atlas.rest.address - http://horton03.myhost.com:21000
hive-site: version 4 - CHANGED - hive.cluster.delegation.token.store.zookeeper.connectString - horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 => horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181
hive-site: version 4 - CHANGED - hive.server2.authentication - NONE => KERBEROS
hive-site: version 5 - CHANGED - atlas.cluster.name - default => bigdata
hive-site: version 6 - ADDED - my.prop.test - blub
I still need to add the username, however I haven't found it for every config version. Does anyone know if I can retrieve the username of the person that changed the configuration? Hope that helps 🙂 Update: Found the usernames, but I need to map config type (hive-site, hive-env,...) to service name (HIVE).....bit tricky..... http://horton01.myhost.com.com:8080/api/v1/clusters/bigdata/configurations/service_config_versions?service_name=HIVE&fields=service_config_version,user,hosts,service_name,service_config_version_note,stack_id,is_cluster_compatible&minimal_response=true
... View more
11-25-2015
06:10 AM
1 Kudo
I have not done distcp with different Kerberos REALMS, but I think this should be possible. Our documentation only mentions "same principal name must be assigned to the applicable NameNodes", so that auth_to_local configuration can calculate the same username on both sides (Kerberos principal: nn/host1@realm will be user "nn"). As long as the different realms use the same KDC or the KDCs trust each other, this should be possible.
... View more