Member since
09-25-2015
93
Posts
66
Kudos Received
19
Solutions
10-11-2016
12:44 PM
3 Kudos
Enable a Ranger Plugin and audit to HDFS for a
Hadoop component say in this case HiveServer2. Audit files will be stored in folder structure
defined in the audit configuration file of the respective ranger plugin.
Default format is /ranger/audit/<hadoop-component>/<YYYYMMDD>/<component>_ranger_audit_<hosname>.<count>.log Example audit log for HiveServer2 ranger plugin: hdfs dfs -ls -R /ranger/audit/hiveServer2 /ranger/audit/hiveServer2/20160315 /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160316 /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160317 /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.2.log Procedure to Create Store Ranger Audit Log in HIVE
Create a Hive External
table with a dummy location for input. DROP TABLE IF EXISTS ranger_audit_event_json_tmp; CREATE TEMPORARY EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) PARTITIONED BY (evtDate String) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/dummy/location'; Alter the Temporary
Table to have partitioned by the Date. This is needed to load each days log
file into hive table. This will load the data into tabl ALTER
TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='20160315') LOCATION
'/ranger/audit/hdfs/20160315'; ALTER
TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate=’20160316’) LOCATION
‘/ranger/audit/hdfs/20160316’;
SCRIPT to automate.
Create shell script “create_ranger_audit_in_hive.sh” cmd=`hdfs dfs -ls
/ranger/audit/hdfs | cut -d" " -f19` audit_file=`echo $cmd` beeline -u
"jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM"
-e "CREATE EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int,
access string, result int, reason string, enforcer string, repoType int, repo
string, cliIP string, action string,
agentHost string, logType string, id string ) PARTITIONED BY (evtDate
string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION
'/dummy/location'" for file in $audit_file do partition=`echo $file | cut
-d"/" -f5` echo ${partition} beeline -u
"jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM"
-e " ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION
(evtDate='${partition}') LOCATION '/ranger/audit/hdfs/${partition}'" done
ORC Format for
the audit data DROP
TABLE IF EXISTS ranger_audit_event; CREATE
TABLE ranger_audit_event (
resource string,
resType string,
reqUser string,
evtTime TIMESTAMP,
policy int,
access string,
result int,
reason string,
enforcer string,
repoType int,
repo string,
cliIP string,
action string,
agentHost string,
logType string,
id string )
STORED AS ORC tblproperties ("orc.compress"="ZLIB"); CREATE
INDEX i_id ON
TABLE ranger_audit_event (id)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD; INSERT INTO TABLE ranger_audit_event
select * from ranger_audit_event_json_tmp;
... View more
Labels:
02-25-2016
07:57 PM
7 Kudos
By default, Ranger uses the log4j DailyRollingFileAppender to manage rotation of log files. This is the appender used by many HDP components, and it does not have a concept of max number of files to keep around. The advantage of using this appender is that it is easier to locate a log entry since files are split up on date boundaries. The main disadvantage of this appender is that there is no way to limit the amount of log data that is kept. There are two common options for dealing with this. 1. Keep the DailyRollingFileAppender, and trim logs using a cron script. The following script would only keep 30 days worth of logs: #!/bin/bash
find /var/log/ranger -mtime +30 | xargs --no-run-if-empty rm You can change the +30 to be +number_of_days. This script could be installed, for example, in a file called /etc/cron.daily/trim_ranger_logs and it would run each morning and remove logs older than 30 days. 2. Change to the RollingFileAppender. To do this you need to modify the log4j.xml file for the appropriate component: ranger-admin: /usr/hdp/<version>/ranger-admin/ews/webapp/WEB-INF/log4j.xml ranger-usersync:/etc/ranger/usersync/conf/log4j.xml Change these files as follows: 1. Change the appender definitions which use org.apache.log4j.DailyRollingFileAppender to use org.apache.log4j.RollingFileAppender 2. Change those appenders additionally by removing the datePattern parameter, and replace it with two parameters: maxBackupIndex - this would be set to the number of files you want to keep maxFileSize - this would be set the max size of each file before it's rotated. For example, the xa_log_appenderfrom /usr/hdp/<version>/ranger-admin/ews/webapp/WEB-INF/log4j.xml suitably modified to rotate files after they grow to a size of 1MB and keep 30 of them looks like this: <appender name="xa_log_appender">
<param name="file" value="${catalina.base}/logs/xa_portal.log" />
<param name="maxFileSize" value="1MB" />
<param name="maxBackupIndex" value="30" />
<param name="append" value="true" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %C{6} (%F:%L) - %m%n" /> </layout>
</appender> Note that the default for maxFileSize is 10MB. After you do this you need to restart the Ranger services. The advantage to the RollingFileAppender is a predictable log footprint. The disadvantage is that your logs are no longer cleanly broken on date boundaries, which might lead you to spend more time searching your logs in case of a problem.
... View more
Labels: