About rmani

rmani · ‎10-11-2016

Enable a Ranger Plugin and audit to HDFS for a Hadoop component say in this case HiveServer2. Audit files will be stored in folder structure defined in the audit configuration file of the respective ranger plugin. Default format is /ranger/audit/<hadoop-component>/<YYYYMMDD>/<component>_ranger_audit_<hosname>.<count>.log Example audit log for HiveServer2 ranger plugin: hdfs dfs -ls -R /ranger/audit/hiveServer2 /ranger/audit/hiveServer2/20160315 /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160316 /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160317 /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.2.log Procedure to Create Store Ranger Audit Log in HIVE Create a Hive External table with a dummy location for input. DROP TABLE IF EXISTS ranger_audit_event_json_tmp; CREATE TEMPORARY EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) PARTITIONED BY (evtDate String) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/dummy/location'; Alter the Temporary Table to have partitioned by the Date. This is needed to load each days log file into hive table. This will load the data into tabl ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='20160315') LOCATION '/ranger/audit/hdfs/20160315'; ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate=’20160316’) LOCATION ‘/ranger/audit/hdfs/20160316’; SCRIPT to automate. Create shell script “create_ranger_audit_in_hive.sh” cmd=`hdfs dfs -ls /ranger/audit/hdfs | cut -d" " -f19` audit_file=`echo $cmd` beeline -u "jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM" -e "CREATE EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) PARTITIONED BY (evtDate string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/dummy/location'" for file in $audit_file do partition=`echo $file | cut -d"/" -f5` echo ${partition} beeline -u "jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM" -e " ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='${partition}') LOCATION '/ranger/audit/hdfs/${partition}'" done ORC Format for the audit data DROP TABLE IF EXISTS ranger_audit_event; CREATE TABLE ranger_audit_event ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) STORED AS ORC tblproperties ("orc.compress"="ZLIB"); CREATE INDEX i_id ON TABLE ranger_audit_event (id) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; INSERT INTO TABLE ranger_audit_event select * from ranger_audit_event_json_tmp;

rmani · ‎02-25-2016

By default, Ranger uses the log4j DailyRollingFileAppender to manage rotation of log files. This is the appender used by many HDP components, and it does not have a concept of max number of files to keep around. The advantage of using this appender is that it is easier to locate a log entry since files are split up on date boundaries. The main disadvantage of this appender is that there is no way to limit the amount of log data that is kept. There are two common options for dealing with this. 1. Keep the DailyRollingFileAppender, and trim logs using a cron script. The following script would only keep 30 days worth of logs: #!/bin/bash find /var/log/ranger -mtime +30 | xargs --no-run-if-empty rm You can change the +30 to be +number_of_days. This script could be installed, for example, in a file called /etc/cron.daily/trim_ranger_logs and it would run each morning and remove logs older than 30 days. 2. Change to the RollingFileAppender. To do this you need to modify the log4j.xml file for the appropriate component: ranger-admin: /usr/hdp/<version>/ranger-admin/ews/webapp/WEB-INF/log4j.xml ranger-usersync:/etc/ranger/usersync/conf/log4j.xml Change these files as follows: 1. Change the appender definitions which use org.apache.log4j.DailyRollingFileAppender to use org.apache.log4j.RollingFileAppender 2. Change those appenders additionally by removing the datePattern parameter, and replace it with two parameters: maxBackupIndex - this would be set to the number of files you want to keep maxFileSize - this would be set the max size of each file before it's rotated. For example, the xa_log_appenderfrom /usr/hdp/<version>/ranger-admin/ews/webapp/WEB-INF/log4j.xml suitably modified to rotate files after they grow to a size of 1MB and keep 30 of them looks like this: <appender name="xa_log_appender"> <param name="file" value="${catalina.base}/logs/xa_portal.log" /> <param name="maxFileSize" value="1MB" /> <param name="maxBackupIndex" value="30" /> <param name="append" value="true" /> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d [%t] %-5p %C{6} (%F:%L) - %m%n" /> </layout> </appender> Note that the default for maxFileSize is 10MB. After you do this you need to restart the Ranger services. The advantage to the RollingFileAppender is a predictable log footprint. The disadvantage is that your logs are no longer cleanly broken on date boundaries, which might lead you to spend more time searching your logs in case of a problem.

Online	Offline
Last Visited	‎05-23-2025 08:40 PM

Member Since	‎09-25-2015 05:12 PM
Last Visited	‎05-23-2025 08:40 PM
Posts	93
Kudos received	65

Cloudera Community

Ranger Audit in Hive Table - a sample approach

How to limit the size of ranger log and number of ...