Created 11-10-2016 12:59 AM
I need some help to create hive table for the below format. this is ranger audit info, will create partitions accordingly. below info is just one line,
{"repoType":1,"repo":"abc_hadoop","reqUser":"ams","evtTime":"2016-09-19 13:14:40.197","access":"READ","resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","resType":"path","result":1,"policy":41,"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","enforcer":"ranger-acl","cliIP":"123.129.390.140","agentHost":"hostname.sample.com","logType":"RangerAudit","id":"94143368-600c-44b9-a0c8-d906b4367537","seq_num":1240883,"event_count":1,"event_dur_ms":0}
Created 12-02-2016 02:17 PM
I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition
CREATE EXTERNAL TABLE hdfs_audit(
access string,
agenthost string,
cliip string,
enforcer string,
event_count bigint,
event_dur_ms bigint,
evttime timestamp,
id string,
logtype string,
policy bigint,
reason string,
repo string,
repotype bigint,
requser string,
restype string,
resource string,
result bigint,
seq_num bigint)
PARTITIONED BY (
evt_time string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://CLUSTERNAME/ranger/database/hdfs';
Add partition:
ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';
Created 11-10-2016 03:58 AM
One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_
The formatted json is below:
{ "repoType":1, "repo":"abc_hadoop", "reqUser":"ams", "evtTime":"2016-09-19 13:14:40.197", "access":"READ", "resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318", "resType":"path", "result":1, "policy":41, "reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318", "enforcer":"ranger-acl", "cliIP":"123.129.390.140", "agentHost":"hostname.sample.com", "logType":"RangerAudit", "id":"94143368-600c-44b9-a0c8-d906b4367537", "seq_num":1240883, "event_count":1, "event_dur_ms":0 }
since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option
Created 11-11-2016 07:39 PM
Also refer this https://community.hortonworks.com/content/kbentry/60802/ranger-audit-in-hive-table-a-sample-approach.... pm how an audit logs in hdfs can server a hive table.
Created 12-02-2016 02:17 PM
I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition
CREATE EXTERNAL TABLE hdfs_audit(
access string,
agenthost string,
cliip string,
enforcer string,
event_count bigint,
event_dur_ms bigint,
evttime timestamp,
id string,
logtype string,
policy bigint,
reason string,
repo string,
repotype bigint,
requser string,
restype string,
resource string,
result bigint,
seq_num bigint)
PARTITIONED BY (
evt_time string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://CLUSTERNAME/ranger/database/hdfs';
Add partition:
ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';