Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

create hive table

avatar
Super Collaborator

I need some help to create hive table for the below format. this is ranger audit info, will create partitions accordingly. below info is just one line,

{"repoType":1,"repo":"abc_hadoop","reqUser":"ams","evtTime":"2016-09-19 13:14:40.197","access":"READ","resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","resType":"path","result":1,"policy":41,"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","enforcer":"ranger-acl","cliIP":"123.129.390.140","agentHost":"hostname.sample.com","logType":"RangerAudit","id":"94143368-600c-44b9-a0c8-d906b4367537","seq_num":1240883,"event_count":1,"event_dur_ms":0}

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';

View solution in original post

3 REPLIES 3

avatar
Master Guru

One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_

The formatted json is below:

{  
   "repoType":1,
   "repo":"abc_hadoop",
   "reqUser":"ams",
   "evtTime":"2016-09-19 13:14:40.197",
   "access":"READ",
   "resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "resType":"path",
   "result":1,
   "policy":41,
   "reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "enforcer":"ranger-acl",
   "cliIP":"123.129.390.140",
   "agentHost":"hostname.sample.com",
   "logType":"RangerAudit",
   "id":"94143368-600c-44b9-a0c8-d906b4367537",
   "seq_num":1240883,
   "event_count":1,
   "event_dur_ms":0
}

since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option

avatar
Super Collaborator

Also refer this https://community.hortonworks.com/content/kbentry/60802/ranger-audit-in-hive-table-a-sample-approach.... pm how an audit logs in hdfs can server a hive table.

avatar
Super Collaborator

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';