Community Articles
Find and share helpful community-sourced technical articles
Labels (1)
Expert Contributor
  • Enable a Ranger Plugin and audit to HDFS for a Hadoop component say in this case HiveServer2.
  • Audit files will be stored in folder structure defined in the audit configuration file of the respective ranger plugin. Default format is

/ranger/audit/<hadoop-component>/<YYYYMMDD>/<component>_ranger_audit_<hosname>.<count>.log

Example audit log for HiveServer2 ranger plugin:

hdfs dfs -ls -R /ranger/audit/hiveServer2

/ranger/audit/hiveServer2/20160315

/ranger/audit/hiveServer2/20160315/hive_ranger_audit_.1.log

/ranger/audit/hiveServer2/20160315/hive_ranger_audit_.2.log

/ranger/audit/hiveServer2/20160316

/ranger/audit/hiveServer2/20160316/hive_ranger_audit_.1.log

/ranger/audit/hiveServer2/20160316/hive_ranger_audit_.2.log

/ranger/audit/hiveServer2/20160317

/ranger/audit/hiveServer2/20160317/hive_ranger_audit_.1.log

/ranger/audit/hiveServer2/20160317/hive_ranger_audit_.2.log

Procedure to Create Store Ranger Audit Log in HIVE

  • Create a Hive External table with a dummy location for input.

DROP TABLE IF EXISTS ranger_audit_event_json_tmp;

CREATE TEMPORARY EXTERNAL TABLE ranger_audit_event_json_tmp (

resource string,

resType string,

reqUser string,

evtTime TIMESTAMP,

policy int,

access string,

result int,

reason string,

enforcer string,

repoType int,

repo string,

cliIP string,

action string,

agentHost string,

logType string,

id string

)

PARTITIONED BY (evtDate String)

row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'

LOCATION '/dummy/location';

  • Alter the Temporary Table to have partitioned by the Date. This is needed to load each days log file into hive table. This will load the data into tabl

ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='20160315') LOCATION '/ranger/audit/hdfs/20160315';

ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate=’20160316’) LOCATION ‘/ranger/audit/hdfs/20160316’;

  • SCRIPT to automate.

Create shell script “create_ranger_audit_in_hive.sh”

cmd=`hdfs dfs -ls /ranger/audit/hdfs | cut -d" " -f19`

audit_file=`echo $cmd`

beeline -u "jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM" -e "CREATE EXTERNAL TABLE ranger_audit_event_json_tmp ( resource string, resType string, reqUser string, evtTime TIMESTAMP, policy int, access string, result int, reason string, enforcer string, repoType int, repo string, cliIP string, action string, agentHost string, logType string, id string ) PARTITIONED BY (evtDate string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/dummy/location'"

for file in $audit_file

do

partition=`echo $file | cut -d"/" -f5`

echo ${partition}

beeline -u "jdbc:hive2://rmani-cluser1:10000/default;principal=hive/rmani-cluser1@EXAMPLE.COM" -e " ALTER TABLE ranger_audit_event_json_tmp ADD PARTITION (evtDate='${partition}') LOCATION '/ranger/audit/hdfs/${partition}'"

done

  • ORC Format for the audit data

DROP TABLE IF EXISTS ranger_audit_event;

CREATE TABLE ranger_audit_event (

resource string,

resType string,

reqUser string,

evtTime TIMESTAMP,

policy int,

access string,

result int,

reason string,

enforcer string,

repoType int,

repo string,

cliIP string,

action string,

agentHost string,

logType string,

id string

)

STORED AS ORC tblproperties ("orc.compress"="ZLIB");

CREATE INDEX i_id

ON TABLE ranger_audit_event (id)

AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

WITH DEFERRED REBUILD;

INSERT INTO TABLE ranger_audit_event select * from ranger_audit_event_json_tmp;

4,207 Views