Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to insert NULL value into Hive complex columns (array & struct),Having NULL value for Hive complex column (struct & array)

avatar
Explorer

We have a Hive table with some columns being arrays and structs. When inserting a row into the table, if we do not have any value for the array and struct column and want to insert a NULL value for them, how do we specify in the INSERT statement the NULL values?

For e.g., if the table definition is:

C1 string

C2 struct<c2_a:array<string>, c2_c:string>

C3 array<struct<c3_a:string, c3_b:string>>

then to insert NULL value to C2 and C3 columns, I was expecting to have the INSERT statement as: “insert into table some_test_table select "c1_val", NULL, NULL from z_dummy;” where z_dummy is a 1 row table. But Hive reports a syntax error.

Any idea on how to specify the correct syntax to use for the above example?

8 REPLIES 8

avatar

Hive does not support literals for complex types (array, map, struct, union), so it is not possible to use them in INSERT INTO...VALUES clauses. This means that the user cannot insert data into a complex datatype column using the INSERT INTO...VALUES clause. link

avatar
Explorer

Hive does support literals for complex types. I didnt find any documentation for it, but I have the syntax that works.

avatar

When you load data into hive table the default Missing values are represented by the special value NULL. SO for simple or complex queries its not possible to insert null values into hive tables using INSERT INTO clause.

avatar
Rising Star

@Ravi Chinni

insert into some_test_table
select 'c1_val',named_struct('c2_a',array(cast (null as string)),'c2_c',cast (null as string)),array(named_struct('c3_a',cast (null as string) ,'c3_b',cast (null as string))) from z_dummy;

This will work for you.

Needless to say, please upvote if the answer was useful. 🙂

avatar
Explorer

@Sagar Morakhia

When you say 'array(cast (null as string))', aren't we inserting one element, whose value is null into the array? The array size here is 1 and the array itself isn't null. For e.g. when I select the row that was inserted using the above statement, I get the following:

hive> select C2, C3 from complex_null_test;

{"c2_a":null,"c2_c":null} [{"c3_a":null,"c3_b":null}]

Notice in the above that the array itself has 1 element of struct type whose values are null.

avatar
@Ravi Chinni

When you load data the default Missing values are represented by the special value NULL.

For Example: Find the sample data:

1,abc,4000,maths$physics$chemistry,hyd

2,hadoop,3000,maths$physics,bang

3,mapreduce,4000,social,bang

For the above data the table create statement: create table array_table(id int,name string,sal int,sub array<string>, city string) row format delimited fields terminated by ',' collection items terminated by '$';

Load statement:

load data local inpath '/Users/abc/Downloads/arrayfile.txt' into table array_table;

hive> select * from array_table;

OK

1abc40000["maths","physics","chemistry"]hyd

2hadoop3000["maths","physics"]bang

3mapreduce4000["social"]bang

Time taken: 0.086 seconds, Fetched: 3 row(s)

hive> select * from array_table where sub[2] is not null;

OK

1abc40000["maths","physics","chemistry"]hyd

Time taken: 0.091 seconds, Fetched: 1 row(s)

hive> select * from array_table where sub[1] is not null;

OK

1abc40000["maths","physics","chemistry"]hyd

2hadoop3000["maths","physics"]bang

Time taken: 0.089 seconds, Fetched: 2 row(s)

Is this what you are looking for.? Can you kinldy respond back if it meets your requirement.?

avatar
Explorer

How do you load data into the above table, with data for some complex columns missing?

avatar

@Ravi Chinni

When you load data into hive table the default Missing values are represented by the special value NULL.

For Example:

1,abc,40000,maths$physics$chemistry,hyd

2,hadoop,3000,maths$physics,bang

3,mapreduce,4000,social,bang

Find the Below Queries:

hive> create table array_table(id int,name string,sal int,sub array<string>, city string) row format delimited fields terminated by ',' collection items terminated by '$';

OK

Time taken: 0.06 seconds

hive> load data local inpath '/Users/abc/sample.txt' into table array_table;

Loading data to table 7may.array_table

Table 7may.array_table stats: [numFiles=1, totalSize=106]

OK

Time taken: 0.146 seconds

hive> select * from array_table;

OK

1abc40000["maths","physics","chemistry"]hyd

2hadoop3000["maths","physics"]bang

3mapreduce4000["social"]bang

Time taken: 0.121 seconds, Fetched: 3 row(s)

hive> select * from array_table where sub[2] is not null;

OK

1abc40000["maths","physics","chemistry"]hyd

Time taken: 0.091 seconds, Fetched: 1 row(s)

hive> select * from array_table where sub[1] is not null;

OK

1abc40000["maths","physics","chemistry"]hyd

2hadoop3000["maths","physics"]bang

Time taken: 0.089 seconds, Fetched: 2 row(s)

Kindly reply if it is useful for your requirement.