Member since
09-25-2015
93
Posts
66
Kudos Received
19
Solutions
10-11-2016
12:44 PM
3 Kudos
Enable a Ranger Plugin and audit to HDFS for a
Hadoop component say in this case HiveServer2. Audit files will be stored in folder structure
defined in the audit configuration file of the respective ranger plugin.
Default format is /ranger/audit/<hadoop-component>/<YYYYMMDD>/<component>_ranger_audit_<hosname>.<count>.log Example audit log for HiveServer2 ranger plugin: hdfs dfs -ls -R /ranger/audit/hiveServer2 /ranger/audit/hiveServer2/20160315 /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160315/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160316 /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160316/hive_ranger_audit_.2.log /ranger/audit/hiveServer2/20160317 /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.1.log /ranger/audit/hiveServer2/20160317/hive_ranger_audit_.2.log Procedure to Create Store Ranger Audit Log in HIVE
Create a Hive External
table with a dummy location for input. DROP TABLE IF EXISTS ranger_audit_event_json_tmp; CREATE TEMPORARY EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) PARTITIONED BY (evtDate String) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/dummy/location'; Alter the Temporary
Table to have partitioned by the Date. This is needed to load each days log
file into hive table. This will load the data into tabl ALTER
TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='20160315') LOCATION
'/ranger/audit/hdfs/20160315'; ALTER
TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate=’20160316’) LOCATION
‘/ranger/audit/hdfs/20160316’;
SCRIPT to automate.
Create shell script “create_ranger_audit_in_hive.sh” cmd=`hdfs dfs -ls
/ranger/audit/hdfs | cut -d" " -f19` audit_file=`echo $cmd` beeline -u
"jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM"
-e "CREATE EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int,
access string, result int, reason string, enforcer string, repoType int, repo
string, cliIP string, action string,
agentHost string, logType string, id string ) PARTITIONED BY (evtDate
string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION
'/dummy/location'" for file in $audit_file do partition=`echo $file | cut
-d"/" -f5` echo ${partition} beeline -u
"jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM"
-e " ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION
(evtDate='${partition}') LOCATION '/ranger/audit/hdfs/${partition}'" done
ORC Format for
the audit data DROP
TABLE IF EXISTS ranger_audit_event; CREATE
TABLE ranger_audit_event (
resource string,
resType string,
reqUser string,
evtTime TIMESTAMP,
policy int,
access string,
result int,
reason string,
enforcer string,
repoType int,
repo string,
cliIP string,
action string,
agentHost string,
logType string,
id string )
STORED AS ORC tblproperties ("orc.compress"="ZLIB"); CREATE
INDEX i_id ON
TABLE ranger_audit_event (id)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD; INSERT INTO TABLE ranger_audit_event
select * from ranger_audit_event_json_tmp;
... View more
Labels:
02-25-2016
07:57 PM
1 Kudo
Great content, thanks for sharing!
... View more