About dvillarreal

dvillarreal · ‎05-17-2019

Customers have asked me about wanting to review ranger audit archive logs stored on HDFS as the UI only shows the Last 90 days of data using Solr infra. I decided to approach the problem using Zeppelin/Spark for a fun example. 1. Prerequisites - Zeppelin and Spark2 installed on your system. As well as ranger with ranger audit logs being stored in HDFS. Create a policy in ranger for HDFS to allow your zeppelin user to read and execute recursively for /ranger/audit directory. 2. Create your notebook in Zeppelin and create some code like the following example: %spark2.spark // --Specify service and date if you wish //val path = "/ranger/audit/hdfs/20190513/*.log" // --Be brave and map the whole enchilada val path = "/ranger/audit/*/*/*.log" // --read in the json and drop any malformed json val rauditDF = spark.read.option("mode", "DROPMALFORMED").json(path) // --print the schema to review and show me top 20 lines. rauditDF.printSchema() rauditDF.show(20,false) // --Do some spark sql on the data and look for denials println("sparksql--------------------") rauditDF.createOrReplaceTempView(viewName="audit") var readAccessDF = spark.sql("SELECT reqUser, repo, access, action, evtTime, policy, resource, reason, enforcer, result FROM audit where result='0'").withColumn("new_result", when(col("result") === "1","Allowed").otherwise("Denied")) readAccessDF.show(20,false) 3. Output should look something like path: String = /ranger/audit/*/*/*.log rauditDF: org.apache.spark.sql.DataFrame = [access: string, action: string ... 23 more fields] root |-- access: string (nullable = true) |-- action: string (nullable = true) |-- additional_info: string (nullable = true) |-- agentHost: string (nullable = true) |-- cliIP: string (nullable = true) |-- cliType: string (nullable = true) |-- cluster_name: string (nullable = true) |-- enforcer: string (nullable = true) |-- event_count: long (nullable = true) |-- event_dur_ms: long (nullable = true) |-- evtTime: string (nullable = true) |-- id: string (nullable = true) |-- logType: string (nullable = true) |-- policy: long (nullable = true) |-- reason: string (nullable = true) |-- repo: string (nullable = true) |-- repoType: long (nullable = true) |-- reqData: string (nullable = true) |-- reqUser: string (nullable = true) |-- resType: string (nullable = true) |-- resource: string (nullable = true) |-- result: long (nullable = true) |-- seq_num: long (nullable = true) |-- sess: string (nullable = true) |-- tags: array (nullable = true) | |-- element: string (containsNull = true) sql readAccessDF: org.apache.spark.sql.DataFrame = [reqUser: string, repo: string ... 9 more fields] +--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+ |reqUser |repo |access |action |evtTime |policy|resource |reason |enforcer |result|new_result| +--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+ |dav |c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:07:23.971|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied | |zeppelin|c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:10:47.288|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied | |dav |c3205_hadoop|EXECUTE |execute|2019-05-13 23:57:49.410|-1 |/ranger/audit/hiveServer2/20190513/hiveServer2_ranger_audit_c3205-node3.hwx.local.log|/ranger/audit/hiveServer2/20190513|hadoop-acl|0 |Denied | |zeppelin|c3205_hive |USE |_any |2019-05-13 23:42:50.643|-1 |null |null |ranger-acl|0 |Denied | |zeppelin|c3205_hive |USE |_any |2019-05-13 23:43:08.732|-1 |default |null |ranger-acl|0 |Denied | |dav |c3205_hive |USE |_any |2019-05-13 23:48:37.603|-1 |null |null |ranger-acl|0 |Denied | +--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+ 4. You can proceed to run sql as well on the audit view information using sql if you so desire. 5. You may need to fine tune your spark interpreter in zeppelin to meet your needs like SPARK_DRIVER_MEMORY, spark.executor.cores, spark.executor.instances, & spark.executor.memory. It helped to see what was happening by tailing the zeppelin log for spark. tailf zeppelin-interpreter-spark2-spark-zeppelin-cluster1.hwx.log

dvillarreal · ‎12-10-2018

If you are on a newer version of Ambari I recommend you take advantage of using FreeIPA option. (Basically AD for Redhat)

dvillarreal · ‎10-15-2018

My pleasure! @Jasper

dvillarreal · ‎10-15-2018

My pleasure! @Jasper

dvillarreal · ‎06-25-2018

@Pankaj Singh Try this one. https://github.com/emaxwell-hw/Atlas-Ranger-Tag-Security

dvillarreal · ‎05-23-2018

Example topology for kerberos auth and hive: [root@groot1 hive]# cat /etc/knox/2.6.0.3-8/0/topologies/kerberos.xml <topology> <gateway> <provider> <role>authentication</role> <name>HadoopAuth</name> <enabled>true</enabled> <param> <name>config.prefix</name> <value>hadoop.auth.config</value> </param> <param> <name>hadoop.auth.config.signature.secret</name> <value>hadoop12345!</value> </param> <param> <name>hadoop.auth.config.type</name> <value>kerberos</value> </param> <param> <name>hadoop.auth.config.simple.anonymous.allowed</name> <value>false</value> </param> <param> <name>hadoop.auth.config.token.validity</name> <value>1800</value> </param> <param> <name>hadoop.auth.config.cookie.domain</name> <value>openstacklocal</value> </param> <param> <name>hadoop.auth.config.cookie.path</name> <value>/gateway/kerberos/hive</value> </param> <param> <name>hadoop.auth.config.kerberos.principal</name> <value>HTTP/groot1.openstacklocal@SUPPORT.COM</value> </param> <param> <name>hadoop.auth.config.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </param> <param> <name>hadoop.auth.config.kerberos.name.rules</name> <value>DEFAULT</value> </param> </provider> <provider> <role>identity-assertion</role> <name>Default</name> <enabled>true</enabled> </provider> <provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>false</enabled> </provider> </gateway> <service> <role>NAMENODE</role> <url>hdfs://groot1.openstacklocal:8020</url> </service> <service> <role>JOBTRACKER</role> <url>rpc://master2.openstacklocal:8050</url> </service> <service> <role>WEBHDFS</role> <url>http://groot1.openstacklocal:50070/webhdfs</url> </service> <service> <role>WEBHCAT</role> <url>http://master2.openstacklocal:50111/templeton</url> </service> <service> <role>HIVE</role> <url>http://groot1.openstacklocal:10001/cliservice</url> </service> <service> <role>RESOURCEMANAGER</role> <url>http://master2.openstacklocal:8088/ws</url> </service> </topology> Example of how to use it: (Don't forget to have knox proxy settings for core-site.xml and if you run into troubles restart both hive and knox.) [root@groot1 hive]# kinit dvillarreal Password for dvillarreal@SUPPORT.COM: [root@groot1 hive]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: dvillarreal@SUPPORT.COM Valid starting Expires Service principal 05/22/18 22:54:43 05/23/18 08:54:40 krbtgt/SUPPORT.COM@SUPPORT.COM renew until 05/29/18 22:54:43 [root@groot1 hive]# beeline Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive beeline> !connect jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive Connecting to jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive Enter username for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive: Enter password for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive: Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://groot1.openstacklocal:8443/> show databases; +----------------+--+ | database_name | +----------------+--+ | default | +----------------+--+ 1 row selected (8.169 seconds)

dvillarreal · ‎05-16-2018

This Ranger Jira is actually dependent on a hive Jira in order for it to be fixed.

dvillarreal · ‎05-09-2018

@Bhushan Kandalkar When I looked at your original error from the knox gateway.log I see: dispatching request: http://hadmgrndcc03-3.test.org:10001/cliservice?user.name=guest org.apache.http.NoHttpResponseException: The gateway-audit.log should show this as well upon dispatch there is a problem knox communicating to hive. This tells me that you never changed your KNOX topology to include the hive service links with the correct protocol from http to https. Make sure knox knows that you should be using https vs http in the topology to communicate to hive.

dvillarreal · ‎04-24-2018

Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0

dvillarreal · ‎03-16-2018

Hi @Karl Fredrickson Check this out. https://github.com/rajkrrsingh/HiveServer2JDBCSample/blob/master/src/main/java/HiveJDBCOverHTTP.java Hope it helps.

Online	Offline
Last Visited	‎08-06-2019 03:24 PM

Member Since	‎10-20-2015 05:31 PM
Last Visited	‎08-06-2019 03:24 PM
Posts	92
Kudos received	78

Cloudera Community

Re: Need help in understanding the PII tag in ATLA...

Re: Knox over Hive SSL Failed

Re: does Ranger audit itself ?

Re: Atlas LDAP authentication failed

Re: knox to hivserver2 call does not work on ssl c...

Using Zeppelin/Spark to query HDFS Ranger Audit lo...

Re: Kerberos and LDAP integration

Re: Knox with kerberos authentication to proxy to ...

Re: Knox with kerberos authentication to proxy to ...

Re: Need help in understanding the PII tag in ATLA...

Knox with kerberos authentication to proxy to hive

Re: Ranger permissions to create temporary functio...

Re: Knox over Hive SSL Failed

Re: Understanding Taxonomy in Apache Atlas

Re: Hive JDBC driver with keytab authentication