Created 11-10-2016 12:59 AM
I need some help to create hive table for the below format. this is ranger audit info, will create partitions accordingly. below info is just one line,
{"repoType":1,"repo":"abc_hadoop","reqUser":"ams","evtTime":"2016-09-19 13:14:40.197","access":"READ","resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","resType":"path","result":1,"policy":41,"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","enforcer":"ranger-acl","cliIP":"123.129.390.140","agentHost":"hostname.sample.com","logType":"RangerAudit","id":"94143368-600c-44b9-a0c8-d906b4367537","seq_num":1240883,"event_count":1,"event_dur_ms":0}
Created 12-02-2016 02:17 PM
I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition
CREATE EXTERNAL TABLE hdfs_audit(
access string,
agenthost string,
cliip string,
enforcer string,
event_count bigint,
event_dur_ms bigint,
evttime timestamp,
id string,
logtype string,
policy bigint,
reason string,
repo string,
repotype bigint,
requser string,
restype string,
resource string,
result bigint,
seq_num bigint)
PARTITIONED BY (
evt_time string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://CLUSTERNAME/ranger/database/hdfs';
Add partition:
ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';
Created 11-10-2016 03:58 AM
One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_
The formatted json is below:
{
"repoType":1,
"repo":"abc_hadoop",
"reqUser":"ams",
"evtTime":"2016-09-19 13:14:40.197",
"access":"READ",
"resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
"resType":"path",
"result":1,
"policy":41,
"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
"enforcer":"ranger-acl",
"cliIP":"123.129.390.140",
"agentHost":"hostname.sample.com",
"logType":"RangerAudit",
"id":"94143368-600c-44b9-a0c8-d906b4367537",
"seq_num":1240883,
"event_count":1,
"event_dur_ms":0
}
since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option
Created 11-11-2016 07:39 PM
Also refer this https://community.hortonworks.com/content/kbentry/60802/ranger-audit-in-hive-table-a-sample-approach.... pm how an audit logs in hdfs can server a hive table.
Created 12-02-2016 02:17 PM
I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition
CREATE EXTERNAL TABLE hdfs_audit(
access string,
agenthost string,
cliip string,
enforcer string,
event_count bigint,
event_dur_ms bigint,
evttime timestamp,
id string,
logtype string,
policy bigint,
reason string,
repo string,
repotype bigint,
requser string,
restype string,
resource string,
result bigint,
seq_num bigint)
PARTITIONED BY (
evt_time string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://CLUSTERNAME/ranger/database/hdfs';
Add partition:
ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';