Support Questions

vtpcnk · ‎09-12-2018

I have this table with what I believe is a nested column.

I created this table with the statement :

create table testtbl stored as AVRO TBLPROPERTIES ('avro.schema.url'='hdfs://testhost:8020/tmp/avroschemas/testtbl.json');

testtbl.json looks like :

{

"type" : "record",

"name" : "testtbl",

"namespace" : "orgn.data.domain",

"fields" : [ {

"name" : "id",

"type" : {

"type" : "record",

"name" : "Key",

"fields" : [ {

"name" : "TId",

"type" : "string"

}, {

"name" : "action",

"type" : "string"

}, {

"name" : "createdTS",

"type" : {

"type" : "long",

"logicalType" : "timestamp-millis"

}

} ]

}

}, {

"name" : "CId",

"type" : "string"

}, {

"name" : "ANumber",

"type" : "string"

} ]

}

Can somebody give me a valid insert statement to insert one row into the table.

Appreciate the help.

nramanaiah · ‎09-12-2018

Try below insert statement

0: jdbc:hive2://abcd:10000> with t as (select NAMED_STRUCT('tid','1','action','success', 'createdts',current_timestamp) as id ,'1' as cid,'12345' as anumber)
0: jdbc:hive2://abcd:10000> insert into testtbl select * from t;
No rows affected (20.464 seconds)
0: jdbc:hive2://abcd:10000> select * from testtbl;
+-----------------------------------------------------------------------+--------------+------------------+--+
|                              testtbl.id                               | testtbl.cid  | testtbl.anumber  |
+-----------------------------------------------------------------------+--------------+------------------+--+
| {"tid":"1","action":"success","createdts":"2018-09-12 15:06:27.075"}  | 1            | 12345            |
+-----------------------------------------------------------------------+--------------+------------------+--+

View solution in original post

vtpcnk · ‎09-12-2018

so is there no way to load the data automatically from the files coming into a particular directory in hdfs?

nramanaiah · ‎09-12-2018

you can create external table with location & write the text files directly on that path.

eg., create external table staging1(id struct<tid:string,action:string,createdts:timestamp>, cid string, anumber string) row format delimited fields terminated by ',' collection items terminated by '|' stored as textfile LOCATION '/tmp/staging/';

All text files can be directly written at /tmp/staging/ by kafka or flume

If Kafka or flume will be able to generate Avro files, then you can skip the staging table & create external avro table & write avro files directly on to external table location.

vtpcnk · ‎09-12-2018

naresh, you are the man!!! thanks so much!!!

vtpcnk · ‎09-12-2018

any idea what is wrong with this :

CREATE EXTERNAL TABLE staging3

ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json')

LOCATION '/tmp/staging';

I am getting :

FAILED: ParseException line 7:0 missing EOF at 'LOCATION' near ')'

quadoss · ‎09-12-2018

@n c

its because that TBLPROPERTIES should the last one, use the below and it should help:

+++++++++++++++++

CREATE EXTERNAL TABLE staging3

ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

LOCATION '/tmp/staging'

TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json');

+++++++++++++++++

vtpcnk · ‎09-12-2018

so with a table created as above, how should the data be to be able to load it in? in what format that is? because i am not specifying any delimiters etc. appreciate the insights.

vtpcnk · ‎09-13-2018

@Mahesh Balakrishnan @Naresh P R @Shu

can i have feedback on how the data should be formatted to be loaded (load inpath command) into a table created as :

CREATE EXTERNAL TABLE staging3

ROW FORMAT SERDE 'org.apache.hadoop.hive.serd2.avro.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

LOCATION '/tmp/staging'

TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json');

The schema is the same as described earlier. But there is no delimiter specified.

Appreciate if you could provide a sample piece of data for this.

nramanaiah · ‎09-13-2018

There are multiple ways to populate avro tables.

1) Insert into avro_table values(<col1>,<col2>..,<colN>) -- This way hive will write avro files.

2) Generating avro files & copying directly to '/tmp/staging', You can read avro documentation to write avro files directly into hdfs path. Avro Reader/Writer APIs will take care of storing & retrieving records, we don't need to explicitly specify delimiters for avro files.

Cloudera Community

Support Questions

How to insert data into this table

Data not inserting in hive table (CDP)

HBase stores base64 data when data is inserted fro...

Cannot Insert Data from Text File Format Table to ...

Can Nifi insert data into uppercase table in postg...

insert complex data in hive table

Spark to read the Hive table sub-directory data

Inserting data into Hbase table through JAVA

sqoop insert into partitioned table

Beeline- unable to insert data

Tez OutOfMemoryError inserting data into a partiti...