Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

create hive table

Solved Go to solution

create hive table

Expert Contributor

I need some help to create hive table for the below format. this is ranger audit info, will create partitions accordingly. below info is just one line,

{"repoType":1,"repo":"abc_hadoop","reqUser":"ams","evtTime":"2016-09-19 13:14:40.197","access":"READ","resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","resType":"path","result":1,"policy":41,"reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318","enforcer":"ranger-acl","cliIP":"123.129.390.140","agentHost":"hostname.sample.com","logType":"RangerAudit","id":"94143368-600c-44b9-a0c8-d906b4367537","seq_num":1240883,"event_count":1,"event_dur_ms":0}

1 ACCEPTED SOLUTION

Accepted Solutions

Re: create hive table

Expert Contributor

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';

3 REPLIES 3

Re: create hive table

Super Guru

One way you can do it easily is by using hive-serde-schema-generator (https://github.com/strelec/hive-serde-schema-gen). Another way is to use hive json serde (https://github.com/rcongiu/Hive-JSON-Serde_

The formatted json is below:

{  
   "repoType":1,
   "repo":"abc_hadoop",
   "reqUser":"ams",
   "evtTime":"2016-09-19 13:14:40.197",
   "access":"READ",
   "resource":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "resType":"path",
   "result":1,
   "policy":41,
   "reason":"/ambari-metrics-collector/hbase/data/hbase/meta/1588230740/info/ed3e52d8b86e4800801539fc4a7b1318",
   "enforcer":"ranger-acl",
   "cliIP":"123.129.390.140",
   "agentHost":"hostname.sample.com",
   "logType":"RangerAudit",
   "id":"94143368-600c-44b9-a0c8-d906b4367537",
   "seq_num":1240883,
   "event_count":1,
   "event_dur_ms":0
}

since the json is not nested, it seems the above choices are most definitely doable. However maybe the most easiest way to do it is using this (https://community.hortonworks.com/articles/37937/importing-and-querying-json-data-in-hive.html) option

Re: create hive table

Expert Contributor

Also refer this https://community.hortonworks.com/content/kbentry/60802/ranger-audit-in-hive-table-a-sample-approach.... pm how an audit logs in hdfs can server a hive table.

Re: create hive table

Expert Contributor

I was able to create hive table on top of json files. below is the syntax i used to create external table..so i donot have to move data, all i need is to add partition

CREATE EXTERNAL TABLE hdfs_audit(

access string,

agenthost string,

cliip string,

enforcer string,

event_count bigint,

event_dur_ms bigint,

evttime timestamp,

id string,

logtype string,

policy bigint,

reason string,

repo string,

repotype bigint,

requser string,

restype string,

resource string,

result bigint,

seq_num bigint)

PARTITIONED BY (

evt_time string)

ROW FORMAT SERDE

'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://CLUSTERNAME/ranger/database/hdfs';

Add partition:

ALTER TABLE ranger_audit.hdfs_audit ADD PARTITION (evt_time='20160601') LOCATION '/ranger/audit/hdfs/20160601/hdfs/20160601';

Don't have an account?
Coming from Hortonworks? Activate your account here