Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

create hive table

avatar
Super Collaborator

I need some help to create hive table for the below format. this is ranger audit info, will create partitions accordingly. below info is just one line,

{"repoType":1,"repo":"abc_hadoop","reqUser":"ams","evtTime":"2016-09-19 13:14:40.197","access":"READ","resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","resType":"path","result":1,"policy":41,"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","enforcer":"ranger-acl","cliIP":"123.129.390.140","agentHost":"hostname.sample.com","logType":"RangerAudit","id":"94143368-600c-44b9-a0c8-d906b4367537","seq_num":1240883,"event_count":1,"event_dur_ms":0}

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';

View solution in original post

3 REPLIES 3

avatar
Master Guru

One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_

The formatted json is below:

{  
   "repoType":1,
   "repo":"abc_hadoop",
   "reqUser":"ams",
   "evtTime":"2016-09-19 13:14:40.197",
   "access":"READ",
   "resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "resType":"path",
   "result":1,
   "policy":41,
   "reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "enforcer":"ranger-acl",
   "cliIP":"123.129.390.140",
   "agentHost":"hostname.sample.com",
   "logType":"RangerAudit",
   "id":"94143368-600c-44b9-a0c8-d906b4367537",
   "seq_num":1240883,
   "event_count":1,
   "event_dur_ms":0
}

since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option

avatar
Super Collaborator

Also refer this https://community.hortonworks.com/content/kbentry/60802/ranger-audit-in-hive-table-a-sample-approach.... pm how an audit logs in hdfs can server a hive table.

avatar
Super Collaborator

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';