About jstraub

jstraub · ‎12-01-2015

Awesome project 🙂

jstraub · ‎11-30-2015

nice one!

jstraub · ‎11-30-2015

@Ali Bajwa saw the same error with spark 1.4.1, it did work with 1.5.1, so it looks like its fixed.

jstraub · ‎11-27-2015

thanks @Neeraj Sabharwal, great article. This helped a lot with my decision.

jstraub · ‎11-27-2015

thanks @Andrew Grande very good points, I totally forgot about these. I mean for Ranger Audit Logs these points don't matter that much (logs also saved to HDFS, Solr log retention < 30days, etc.), but for other projects they do! I really like the point about scalability and reliability, I dont have to plan the storage for solr separately or reserve space on my nodes, I can scale with HDFS 🙂

jstraub · ‎11-27-2015

@bdurai thanks. I have already set up Solr and HDFS Ranger Audit Logs. Solr Logs will automatically be deleted after 30 days (Document expiration). Currently I am using a factor 2 replication as well as 2 shards, but I might be able to increase this even more.

jstraub · ‎11-26-2015

According to the docs, Solr relies heavily on fast bulk reads and writes during index updates. Lets say I want to index thousands of documents (word, pdf, html, ...) or I want to store my Ranger audit logs in my SolrCloud. Is it a good idea to use HDFS as index and data store or should I go with a local non-hdfs data directory? Ranger Audit Logs documentation mentions "1 TB free space in the volume where Solr will store the index data.", which sounds like non-hdfs?!

jstraub · ‎11-26-2015

thanks 🙂

jstraub · ‎11-25-2015

@Olivier Renault I don't think we have a separate audit tool or recording of the changes available, however a short Python script should solve this problem. I just created a short example (quick and dirty solution, needs some tweaking! :P), take a look at this https://github.com/mr-jstraub/ambari-audit-config The repo contains an audit.py script that you can use as follows: Example (audit hive-site to shell): python audit.py --target horton01.myhost.com:8080 --cluster bigdata --user admin --config hive-site Example (audit hive-site to hive-site_audit.log) python audit.py --target horton01.myhost.com:8080 --cluster bigdata --user admin --config hive-site --output hive-site_audit.log Result: hive-site: version 1 - ADDED - javax.jdo.option.ConnectionDriverName - com.mysql.jdbc.Driver hive-site: version 1 - ADDED - hive.fetch.task.aggr - false hive-site: version 1 - ADDED - hive.execution.engine - tez hive-site: version 1 - ADDED - hive.tez.java.opts - -server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps hive-site: version 1 - ADDED - hive.vectorized.groupby.maxentries - 100000 hive-site: version 1 - ADDED - hive.server2.table.type.mapping - CLASSIC ... ... ... hive-site: version 1 - ADDED - hive.compactor.check.interval - 300L hive-site: version 1 - ADDED - hive.compactor.delta.pct.threshold - 0.1f hive-site: version 2 - CHANGED - javax.jdo.option.ConnectionURL - jdbc:mysql://horton03.myhost.com/hive?createDatabaseIfNotExist=true => jdbc:mysql://horton03.myhost.com:3306/hive?createDatabaseIfNotExist=true hive-site: version 2 - CHANGED - hive.zookeeper.quorum - horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 => horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 hive-site: version 2 - CHANGED - hive.cluster.delegation.token.store.zookeeper.connectString - horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 => horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 hive-site: version 3 - CHANGED - javax.jdo.option.ConnectionURL - jdbc:mysql://horton03.myhost.com:3306/hive?createDatabaseIfNotExist=true => jdbc:mysql://horton03.myhost.com/hive?createDatabaseIfNotExist=true hive-site: version 4 - ADDED - atlas.cluster.name - default hive-site: version 4 - CHANGED - hive.exec.post.hooks - org.apache.hadoop.hive.ql.hooks.ATSHook => org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.atlas.hive.hook.HiveHook hive-site: version 4 - CHANGED - hive.metastore.sasl.enabled - false => true hive-site: version 4 - CHANGED - hive.server2.authentication.spnego.principal - /etc/security/keytabs/spnego.service.keytab => HTTP/_HOST@EXAMPLE.COM hive-site: version 4 - CHANGED - hive.server2.authentication.spnego.keytab - HTTP/_HOST@EXAMPLE.COM => /etc/security/keytabs/spnego.service.keytab hive-site: version 4 - ADDED - hive.server2.authentication.kerberos.keytab - /etc/security/keytabs/hive.service.keytab hive-site: version 4 - CHANGED - hive.zookeeper.quorum - horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 => horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 hive-site: version 4 - ADDED - hive.server2.authentication.kerberos.principal - hive/_HOST@EXAMPLE.COM hive-site: version 4 - ADDED - atlas.rest.address - http://horton03.myhost.com:21000 hive-site: version 4 - CHANGED - hive.cluster.delegation.token.store.zookeeper.connectString - horton02.myhost.com:2181,horton03.myhost.com:2181,horton01.myhost.com:2181 => horton03.myhost.com:2181,horton02.myhost.com:2181,horton01.myhost.com:2181 hive-site: version 4 - CHANGED - hive.server2.authentication - NONE => KERBEROS hive-site: version 5 - CHANGED - atlas.cluster.name - default => bigdata hive-site: version 6 - ADDED - my.prop.test - blub I still need to add the username, however I haven't found it for every config version. Does anyone know if I can retrieve the username of the person that changed the configuration? Hope that helps 🙂 Update: Found the usernames, but I need to map config type (hive-site, hive-env,...) to service name (HIVE).....bit tricky..... http://horton01.myhost.com.com:8080/api/v1/clusters/bigdata/configurations/service_config_versions?service_name=HIVE&fields=service_config_version,user,hosts,service_name,service_config_version_note,stack_id,is_cluster_compatible&minimal_response=true

jstraub · ‎11-25-2015

I have not done distcp with different Kerberos REALMS, but I think this should be possible. Our documentation only mentions "same principal name must be assigned to the applicable NameNodes", so that auth_to_local configuration can calculate the same username on both sides (Kerberos principal: nn/host1@realm will be user "nn"). As long as the different realms use the same KDC or the KDCs trust each other, this should be possible.

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	472

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: NiFi: JSON Array split

Re: Securing Solr with Ranger ERROR 500

Re: Is Ambari Infra open source?

Re: After disabling kerberos , ZKfailover not comi...

Re: How to export all the output/error logs for a ...

Re: Error while running hive queries from Zeppelin...

Re: Error while running hive queries from Zeppelin...

Re: SolrCloud Performance - HDFS index/data

Re: SolrCloud Performance - HDFS index/data

Re: SolrCloud Performance - HDFS index/data

SolrCloud Performance - HDFS index/data

Re: Ambari Audit log

Re: Ambari Audit log

Re: Has anyone done distcp between secured cluster...