You may want to take a look at hive-json to extract the json schema (from: https://github.com/hortonworks/hive-json)
i've run:
bin/find-json-schema ~/projects/structor/modules/druid_overlord/files/TimeSeriesQuery.json
and creating the table, but set jsonserde rowformat:
create table tbl (
aggregations array <struct <
fieldName: string,
name: string,
type: string>>,
dataSource string,
granularity string,
intervals array <string>,
queryType string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
;
load the data:
LOAD DATA LOCAL INPATH "/home/kirk/projects/structor/modules/druid_overlord/files/TimeSeriesQuery.json" INTO TABLE tbl;
switch to orc format (this is a columnar format, it will be better this way):
create table tbl_orc (
aggregations array <struct <
fieldName: string,
name: string,
type: string>>,
dataSource string,
granularity string,
intervals array <string>,
queryType string
)
STORED AS ORC
;
insert overwrite table tbl_orc select * from tbl;
select count(*) from tbl_orc where queryType = 'timeseries';
start querying tbl_orc...
i'm only aware of it as of today...but it works like a charm 🙂