Created 09-12-2018 12:27 PM
I have this table with what I believe is a nested column.
I created this table with the statement :
create table testtbl stored as AVRO TBLPROPERTIES ('avro.schema.url'='hdfs://testhost:8020/tmp/avroschemas/testtbl.json');
testtbl.json looks like :
{
"type" : "record",
"name" : "testtbl",
"namespace" : "orgn.data.domain",
"fields" : [ {
"name" : "id",
"type" : {
"type" : "record",
"name" : "Key",
"fields" : [ {
"name" : "TId",
"type" : "string"
}, {
"name" : "action",
"type" : "string"
}, {
"name" : "createdTS",
"type" : {
"type" : "long",
"logicalType" : "timestamp-millis"
}
} ]
}
}, {
"name" : "CId",
"type" : "string"
}, {
"name" : "ANumber",
"type" : "string"
} ]
}
Can somebody give me a valid insert statement to insert one row into the table.
Appreciate the help.
Created 09-12-2018 01:10 PM
Try below insert statement
0: jdbc:hive2://abcd:10000> with t as (select NAMED_STRUCT('tid','1','action','success', 'createdts',current_timestamp) as id ,'1' as cid,'12345' as anumber) 0: jdbc:hive2://abcd:10000> insert into testtbl select * from t; No rows affected (20.464 seconds) 0: jdbc:hive2://abcd:10000> select * from testtbl; +-----------------------------------------------------------------------+--------------+------------------+--+ | testtbl.id | testtbl.cid | testtbl.anumber | +-----------------------------------------------------------------------+--------------+------------------+--+ | {"tid":"1","action":"success","createdts":"2018-09-12 15:06:27.075"} | 1 | 12345 | +-----------------------------------------------------------------------+--------------+------------------+--+
Created 09-12-2018 03:58 PM
so is there no way to load the data automatically from the files coming into a particular directory in hdfs?
Created 09-12-2018 04:06 PM
you can create external table with location & write the text files directly on that path.
eg., create external table staging1(id struct<tid:string,action:string,createdts:timestamp>, cid string, anumber string) row format delimited fields terminated by ',' collection items terminated by '|' stored as textfile LOCATION '/tmp/staging/';
All text files can be directly written at /tmp/staging/ by kafka or flume
If Kafka or flume will be able to generate Avro files, then you can skip the staging table & create external avro table & write avro files directly on to external table location.
Created 09-12-2018 05:02 PM
naresh, you are the man!!! thanks so much!!!
Created 09-12-2018 06:53 PM
any idea what is wrong with this :
CREATE EXTERNAL TABLE staging3
ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json')
LOCATION '/tmp/staging';
I am getting :
FAILED: ParseException line 7:0 missing EOF at 'LOCATION' near ')'
Created 09-12-2018 06:58 PM
its because that TBLPROPERTIES should the last one, use the below and it should help:
+++++++++++++++++
CREATE EXTERNAL TABLE staging3
ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/tmp/staging'
TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json');
+++++++++++++++++
Created 09-12-2018 07:38 PM
so with a table created as above, how should the data be to be able to load it in? in what format that is? because i am not specifying any delimiters etc. appreciate the insights.
Created 09-13-2018 12:54 PM
@Mahesh Balakrishnan @Naresh P R @Shu
can i have feedback on how the data should be formatted to be loaded (load inpath command) into a table created as :
CREATE EXTERNAL TABLE staging3
ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/tmp/staging'
TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json');
The schema is the same as described earlier. But there is no delimiter specified.
Appreciate if you could provide a sample piece of data for this.
Created 09-13-2018 03:00 PM
There are multiple ways to populate avro tables.
1) Insert into avro_table values(<col1>,<col2>..,<colN>) -- This way hive will write avro files.
2) Generating avro files & copying directly to '/tmp/staging', You can read avro documentation to write avro files directly into hdfs path. Avro Reader/Writer APIs will take care of storing & retrieving records, we don't need to explicitly specify delimiters for avro files.