Support Questions

Find answers, ask questions, and share your expertise

Struct datatype insert in hive

avatar

I have created a table create table data ( id int,name string, addreess struc<city:string,State:string>) rows format delimited fields terminated by ',' collections terminated by ',' stored as 'TEXTFILE';

If I use insert into table data select 1,'Bala',named_struct('city','Tampa','state','FL') from anothertable limit 1; Is this correct way to insert a record into hive table. Because when I do that address.state is inserted as null

2 REPLIES 2

avatar
Rising Star

Hive does not support literals for complex types (array, map, struct, union), so it is not possible to use them in INSERT INTO...VALUES clauses. This means that the user cannot insert data into a complex datatype column using the INSERT INTO...VALUES clause. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Insertingdatai...

avatar

Hi @Bala Vignesh N V. I know this is an old question, but I have encountered this recently. This answer may help someone else as well...

The issue you had is most likely caused by specifying " COLLECTION ITEMS TERMINATED BY ',' ".

When the table is defined like this (with COLLECTION ITEMS TERMINATED BY comma):

-- Create a Dummy table to use in the insert query - like an Oracle DUAL table
create table dummy_TBL (col1 int) ;
insert into dummy_TBL (col1) values(1) ; 

create table data_TBL (id int, name string, address struct<city:string,State:string>) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE; 

insert into table data_TBL 
select 1,
      'Bala',
      named_struct('city','Tampa','State','FL') 
from dummy_TBL limit 1;

The address.state value is NULL:

+--------------+----------------+--------------------------------+--+
| data_tbl.id  | data_tbl.name  |        data_tbl.address        |
+--------------+----------------+--------------------------------+--+
| 1            | Bala           | {"city":"Tampa","state":null}  |
+--------------+----------------+--------------------------------+--+

But when you define the table like this (without COLLECTION ITEMS TERMINATED BY comma):

create table data_TBL (id int, name string, address struct<city:string,State:string>) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
STORED AS TEXTFILE; 

insert into table data_TBL 
select 1,
      'Bala',
      named_struct('city','Tampa','State','FL') 
from dummy_TBL limit 1;

The address.state value is correct:

+--------------+----------------+--------------------------------+--+
| data_tbl.id  | data_tbl.name  |        data_tbl.address        |
+--------------+----------------+--------------------------------+--+
| 1            | Bala           | {"city":"Tampa","state":"FL"}  |
+--------------+----------------+--------------------------------+--+

I hope this helps.