Member since
10-20-2015
92
Posts
78
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4171 | 06-25-2018 04:01 PM | |
7017 | 05-09-2018 05:36 PM | |
2470 | 03-16-2018 04:11 PM | |
7705 | 05-18-2017 12:42 PM | |
6448 | 03-28-2017 06:42 PM |
05-17-2019
06:50 PM
1 Kudo
Customers have asked me about wanting to review ranger audit archive logs stored on HDFS as the UI only shows the Last 90 days of data using Solr infra. I decided to approach the problem using Zeppelin/Spark for a fun example. 1. Prerequisites - Zeppelin and Spark2 installed on your system. As well as ranger with ranger audit logs being stored in HDFS. Create a policy in ranger for HDFS to allow your zeppelin user to read and execute recursively for /ranger/audit directory. 2. Create your notebook in Zeppelin and create some code like the following example: %spark2.spark
// --Specify service and date if you wish
//val path = "/ranger/audit/hdfs/20190513/*.log"
// --Be brave and map the whole enchilada
val path = "/ranger/audit/*/*/*.log"
// --read in the json and drop any malformed json
val rauditDF = spark.read.option("mode", "DROPMALFORMED").json(path)
// --print the schema to review and show me top 20 lines.
rauditDF.printSchema()
rauditDF.show(20,false)
// --Do some spark sql on the data and look for denials
println("sparksql--------------------")
rauditDF.createOrReplaceTempView(viewName="audit")
var readAccessDF = spark.sql("SELECT reqUser, repo, access, action, evtTime, policy, resource, reason, enforcer, result FROM audit where result='0'").withColumn("new_result", when(col("result") === "1","Allowed").otherwise("Denied"))
readAccessDF.show(20,false) 3. Output should look something like path: String = /ranger/audit/*/*/*.log
rauditDF: org.apache.spark.sql.DataFrame = [access: string, action: string ... 23 more fields]
root
|-- access: string (nullable = true)
|-- action: string (nullable = true)
|-- additional_info: string (nullable = true)
|-- agentHost: string (nullable = true)
|-- cliIP: string (nullable = true)
|-- cliType: string (nullable = true)
|-- cluster_name: string (nullable = true)
|-- enforcer: string (nullable = true)
|-- event_count: long (nullable = true)
|-- event_dur_ms: long (nullable = true)
|-- evtTime: string (nullable = true)
|-- id: string (nullable = true)
|-- logType: string (nullable = true)
|-- policy: long (nullable = true)
|-- reason: string (nullable = true)
|-- repo: string (nullable = true)
|-- repoType: long (nullable = true)
|-- reqData: string (nullable = true)
|-- reqUser: string (nullable = true)
|-- resType: string (nullable = true)
|-- resource: string (nullable = true)
|-- result: long (nullable = true)
|-- seq_num: long (nullable = true)
|-- sess: string (nullable = true)
|-- tags: array (nullable = true)
| |-- element: string (containsNull = true)
sql
readAccessDF: org.apache.spark.sql.DataFrame = [reqUser: string, repo: string ... 9 more fields]
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+
|reqUser |repo |access |action |evtTime |policy|resource |reason |enforcer |result|new_result|
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+
|dav |c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:07:23.971|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied |
|zeppelin|c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:10:47.288|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied |
|dav |c3205_hadoop|EXECUTE |execute|2019-05-13 23:57:49.410|-1 |/ranger/audit/hiveServer2/20190513/hiveServer2_ranger_audit_c3205-node3.hwx.local.log|/ranger/audit/hiveServer2/20190513|hadoop-acl|0 |Denied |
|zeppelin|c3205_hive |USE |_any |2019-05-13 23:42:50.643|-1 |null |null |ranger-acl|0 |Denied |
|zeppelin|c3205_hive |USE |_any |2019-05-13 23:43:08.732|-1 |default |null |ranger-acl|0 |Denied |
|dav |c3205_hive |USE |_any |2019-05-13 23:48:37.603|-1 |null |null |ranger-acl|0 |Denied |
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+ 4. You can proceed to run sql as well on the audit view information using sql if you so desire. 5. You may need to fine tune your spark interpreter in zeppelin to meet your needs like SPARK_DRIVER_MEMORY, spark.executor.cores, spark.executor.instances, & spark.executor.memory. It helped to see what was happening by tailing the zeppelin log for spark. tailf zeppelin-interpreter-spark2-spark-zeppelin-cluster1.hwx.log
... View more
Labels:
12-10-2018
04:57 PM
If you are on a newer version of Ambari I recommend you take advantage of using FreeIPA option. (Basically AD for Redhat)
... View more
10-15-2018
03:23 PM
My pleasure! @Jasper
... View more
10-15-2018
03:23 PM
My pleasure! @Jasper
... View more
06-25-2018
04:01 PM
@Pankaj Singh Try this one. https://github.com/emaxwell-hw/Atlas-Ranger-Tag-Security
... View more
05-23-2018
02:22 AM
2 Kudos
Example topology for kerberos auth and hive: [root@groot1 hive]# cat /etc/knox/2.6.0.3-8/0/topologies/kerberos.xml <topology>
<gateway>
<provider>
<role>authentication</role>
<name>HadoopAuth</name>
<enabled>true</enabled>
<param>
<name>config.prefix</name>
<value>hadoop.auth.config</value>
</param>
<param>
<name>hadoop.auth.config.signature.secret</name>
<value>hadoop12345!</value>
</param>
<param>
<name>hadoop.auth.config.type</name>
<value>kerberos</value>
</param>
<param>
<name>hadoop.auth.config.simple.anonymous.allowed</name>
<value>false</value>
</param>
<param>
<name>hadoop.auth.config.token.validity</name>
<value>1800</value>
</param>
<param>
<name>hadoop.auth.config.cookie.domain</name>
<value>openstacklocal</value>
</param>
<param>
<name>hadoop.auth.config.cookie.path</name>
<value>/gateway/kerberos/hive</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.principal</name>
<value>HTTP/groot1.openstacklocal@SUPPORT.COM</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.name.rules</name>
<value>DEFAULT</value>
</param>
</provider>
<provider>
<role>identity-assertion</role>
<name>Default</name>
<enabled>true</enabled>
</provider>
<provider>
<role>authorization</role>
<name>AclsAuthz</name>
<enabled>false</enabled>
</provider>
</gateway>
<service>
<role>NAMENODE</role>
<url>hdfs://groot1.openstacklocal:8020</url>
</service>
<service>
<role>JOBTRACKER</role>
<url>rpc://master2.openstacklocal:8050</url>
</service>
<service>
<role>WEBHDFS</role>
<url>http://groot1.openstacklocal:50070/webhdfs</url>
</service>
<service>
<role>WEBHCAT</role>
<url>http://master2.openstacklocal:50111/templeton</url>
</service>
<service>
<role>HIVE</role>
<url>http://groot1.openstacklocal:10001/cliservice</url>
</service>
<service>
<role>RESOURCEMANAGER</role>
<url>http://master2.openstacklocal:8088/ws</url>
</service>
</topology> Example of how to use it: (Don't forget to have knox proxy settings for core-site.xml and if you run into troubles restart both hive and knox.) [root@groot1 hive]# kinit dvillarreal
Password for dvillarreal@SUPPORT.COM:
[root@groot1 hive]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: dvillarreal@SUPPORT.COM
Valid starting Expires Service principal
05/22/18 22:54:43 05/23/18 08:54:40 krbtgt/SUPPORT.COM@SUPPORT.COM
renew until 05/29/18 22:54:43
[root@groot1 hive]# beeline
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
beeline> !connect jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive
Connecting to jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive
Enter username for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive:
Enter password for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive:
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://groot1.openstacklocal:8443/> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (8.169 seconds)
... View more
Labels:
05-16-2018
08:27 PM
This Ranger Jira is actually dependent on a hive Jira in order for it to be fixed.
... View more
05-09-2018
05:36 PM
1 Kudo
@Bhushan Kandalkar When I looked at your original error from the knox gateway.log I see: dispatching request: http://hadmgrndcc03-3.test.org:10001/cliservice?user.name=guest org.apache.http.NoHttpResponseException: The gateway-audit.log should show this as well upon dispatch there is a problem knox communicating to hive. This tells me that you never changed your KNOX topology to include the hive service links with the correct protocol from http to https. Make sure knox knows that you should be using https vs http in the topology to communicate to hive.
... View more
04-24-2018
09:04 PM
Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0
... View more
03-16-2018
04:37 PM
Hi @Karl Fredrickson Check this out. https://github.com/rajkrrsingh/HiveServer2JDBCSample/blob/master/src/main/java/HiveJDBCOverHTTP.java Hope it helps.
... View more