Member since
10-20-2015
92
Posts
79
Kudos Received
9
Solutions
05-17-2019
06:50 PM
1 Kudo
Customers have asked me about wanting to review ranger audit archive logs stored on HDFS as the UI only shows the Last 90 days of data using Solr infra. I decided to approach the problem using Zeppelin/Spark for a fun example. 1. Prerequisites - Zeppelin and Spark2 installed on your system. As well as ranger with ranger audit logs being stored in HDFS. Create a policy in ranger for HDFS to allow your zeppelin user to read and execute recursively for /ranger/audit directory. 2. Create your notebook in Zeppelin and create some code like the following example: %spark2.spark
// --Specify service and date if you wish
//val path = "/ranger/audit/hdfs/20190513/*.log"
// --Be brave and map the whole enchilada
val path = "/ranger/audit/*/*/*.log"
// --read in the json and drop any malformed json
val rauditDF = spark.read.option("mode", "DROPMALFORMED").json(path)
// --print the schema to review and show me top 20 lines.
rauditDF.printSchema()
rauditDF.show(20,false)
// --Do some spark sql on the data and look for denials
println("sparksql--------------------")
rauditDF.createOrReplaceTempView(viewName="audit")
var readAccessDF = spark.sql("SELECT reqUser, repo, access, action, evtTime, policy, resource, reason, enforcer, result FROM audit where result='0'").withColumn("new_result", when(col("result") === "1","Allowed").otherwise("Denied"))
readAccessDF.show(20,false) 3. Output should look something like path: String = /ranger/audit/*/*/*.log
rauditDF: org.apache.spark.sql.DataFrame = [access: string, action: string ... 23 more fields]
root
|-- access: string (nullable = true)
|-- action: string (nullable = true)
|-- additional_info: string (nullable = true)
|-- agentHost: string (nullable = true)
|-- cliIP: string (nullable = true)
|-- cliType: string (nullable = true)
|-- cluster_name: string (nullable = true)
|-- enforcer: string (nullable = true)
|-- event_count: long (nullable = true)
|-- event_dur_ms: long (nullable = true)
|-- evtTime: string (nullable = true)
|-- id: string (nullable = true)
|-- logType: string (nullable = true)
|-- policy: long (nullable = true)
|-- reason: string (nullable = true)
|-- repo: string (nullable = true)
|-- repoType: long (nullable = true)
|-- reqData: string (nullable = true)
|-- reqUser: string (nullable = true)
|-- resType: string (nullable = true)
|-- resource: string (nullable = true)
|-- result: long (nullable = true)
|-- seq_num: long (nullable = true)
|-- sess: string (nullable = true)
|-- tags: array (nullable = true)
| |-- element: string (containsNull = true)
sql
readAccessDF: org.apache.spark.sql.DataFrame = [reqUser: string, repo: string ... 9 more fields]
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+
|reqUser |repo |access |action |evtTime |policy|resource |reason |enforcer |result|new_result|
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+
|dav |c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:07:23.971|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied |
|zeppelin|c3205_hadoop|READ_EXECUTE|execute|2019-05-13 22:10:47.288|-1 |/ranger/audit/hdfs |/ranger/audit/hdfs |hadoop-acl|0 |Denied |
|dav |c3205_hadoop|EXECUTE |execute|2019-05-13 23:57:49.410|-1 |/ranger/audit/hiveServer2/20190513/hiveServer2_ranger_audit_c3205-node3.hwx.local.log|/ranger/audit/hiveServer2/20190513|hadoop-acl|0 |Denied |
|zeppelin|c3205_hive |USE |_any |2019-05-13 23:42:50.643|-1 |null |null |ranger-acl|0 |Denied |
|zeppelin|c3205_hive |USE |_any |2019-05-13 23:43:08.732|-1 |default |null |ranger-acl|0 |Denied |
|dav |c3205_hive |USE |_any |2019-05-13 23:48:37.603|-1 |null |null |ranger-acl|0 |Denied |
+--------+------------+------------+-------+-----------------------+------+-------------------------------------------------------------------------------------+----------------------------------+----------+------+----------+ 4. You can proceed to run sql as well on the audit view information using sql if you so desire. 5. You may need to fine tune your spark interpreter in zeppelin to meet your needs like SPARK_DRIVER_MEMORY, spark.executor.cores, spark.executor.instances, & spark.executor.memory. It helped to see what was happening by tailing the zeppelin log for spark. tailf zeppelin-interpreter-spark2-spark-zeppelin-cluster1.hwx.log
... View more
Labels:
10-15-2018
03:23 PM
My pleasure! @Jasper
... View more
10-15-2018
03:23 PM
My pleasure! @Jasper
... View more
05-23-2018
02:22 AM
2 Kudos
Example topology for kerberos auth and hive: [root@groot1 hive]# cat /etc/knox/2.6.0.3-8/0/topologies/kerberos.xml <topology>
<gateway>
<provider>
<role>authentication</role>
<name>HadoopAuth</name>
<enabled>true</enabled>
<param>
<name>config.prefix</name>
<value>hadoop.auth.config</value>
</param>
<param>
<name>hadoop.auth.config.signature.secret</name>
<value>hadoop12345!</value>
</param>
<param>
<name>hadoop.auth.config.type</name>
<value>kerberos</value>
</param>
<param>
<name>hadoop.auth.config.simple.anonymous.allowed</name>
<value>false</value>
</param>
<param>
<name>hadoop.auth.config.token.validity</name>
<value>1800</value>
</param>
<param>
<name>hadoop.auth.config.cookie.domain</name>
<value>openstacklocal</value>
</param>
<param>
<name>hadoop.auth.config.cookie.path</name>
<value>/gateway/kerberos/hive</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.principal</name>
<value>HTTP/groot1.openstacklocal@SUPPORT.COM</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</param>
<param>
<name>hadoop.auth.config.kerberos.name.rules</name>
<value>DEFAULT</value>
</param>
</provider>
<provider>
<role>identity-assertion</role>
<name>Default</name>
<enabled>true</enabled>
</provider>
<provider>
<role>authorization</role>
<name>AclsAuthz</name>
<enabled>false</enabled>
</provider>
</gateway>
<service>
<role>NAMENODE</role>
<url>hdfs://groot1.openstacklocal:8020</url>
</service>
<service>
<role>JOBTRACKER</role>
<url>rpc://master2.openstacklocal:8050</url>
</service>
<service>
<role>WEBHDFS</role>
<url>http://groot1.openstacklocal:50070/webhdfs</url>
</service>
<service>
<role>WEBHCAT</role>
<url>http://master2.openstacklocal:50111/templeton</url>
</service>
<service>
<role>HIVE</role>
<url>http://groot1.openstacklocal:10001/cliservice</url>
</service>
<service>
<role>RESOURCEMANAGER</role>
<url>http://master2.openstacklocal:8088/ws</url>
</service>
</topology> Example of how to use it: (Don't forget to have knox proxy settings for core-site.xml and if you run into troubles restart both hive and knox.) [root@groot1 hive]# kinit dvillarreal
Password for dvillarreal@SUPPORT.COM:
[root@groot1 hive]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: dvillarreal@SUPPORT.COM
Valid starting Expires Service principal
05/22/18 22:54:43 05/23/18 08:54:40 krbtgt/SUPPORT.COM@SUPPORT.COM
renew until 05/29/18 22:54:43
[root@groot1 hive]# beeline
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
beeline> !connect jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive
Connecting to jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive
Enter username for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive:
Enter password for jdbc:hive2://groot1.openstacklocal:8443/;ssl=true;principal=HTTP/_HOST@SUPPORT.COM;transportMode=http;httpPath=gateway/kerberos/hive:
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://groot1.openstacklocal:8443/> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (8.169 seconds)
... View more
Labels:
04-24-2018
09:04 PM
Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0
... View more
11-15-2017
10:50 PM
Can you have multiple nested groups? Say you have some nested groups in ou=groups and ou=groups2 If you set the base to have ou=groups,dc=test,dc=com;ou=groups2,dc=test,dc=test,dc=com will it pick up the hierarchy levels for each ou?
... View more
08-29-2017
10:50 PM
This may not work depending on your version because of this bug https://issues.apache.org/jira/browse/RANGER-1554 Should be fixed in HDP 2.6.1.
... View more
04-13-2017
05:44 PM
2 Kudos
delete-scripts.zip Attached are scripts you can use for ORACLE, PSQL, and MYSQL for deleting users. Directions to how these scripts work are included in the script. Good practice is to make sure to backup your db before attempting to use and test these scripts.
... View more
Labels:
03-06-2017
09:11 PM
@badr bakkou This would probably be best answered if you submitted as a new question. Provide the gateway.log & gateway-audit.log outputs, topology, and lastly the configuration string you are using with its associated output. Best regards, David
... View more
03-06-2017
08:56 PM
@Hajime It is not mandatory for WEBHDFS to work. However, It is good practice to make this change in NN HA env. as other services like oozie use this for doing rewrites.
... View more