Created 06-09-2017 08:30 PM
We have a Hive table with some columns being arrays and structs. When inserting a row into the table, if we do not have any value for the array and struct column and want to insert a NULL value for them, how do we specify in the INSERT statement the NULL values?
For e.g., if the table definition is:
C1 string
C2 struct<c2_a:array<string>, c2_c:string>
C3 array<struct<c3_a:string, c3_b:string>>
then to insert NULL value to C2 and C3 columns, I was expecting to have the INSERT statement as: “insert into table some_test_table select "c1_val", NULL, NULL from z_dummy;” where z_dummy is a 1 row table. But Hive reports a syntax error.
Any idea on how to specify the correct syntax to use for the above example?
Created 06-12-2017 09:09 AM
Hive does not support literals for complex types (array, map, struct, union), so it is not possible to use them in INSERT INTO...VALUES clauses. This means that the user cannot insert data into a complex datatype column using the INSERT INTO...VALUES clause. link
Created 06-12-2017 03:28 PM
Hive does support literals for complex types. I didnt find any documentation for it, but I have the syntax that works.
Created 06-12-2017 11:35 AM
When you load data into hive table the default Missing values are represented by the special value NULL. SO for simple or complex queries its not possible to insert null values into hive tables using INSERT INTO clause.
Created 06-12-2017 01:48 PM
insert into some_test_table select 'c1_val',named_struct('c2_a',array(cast (null as string)),'c2_c',cast (null as string)),array(named_struct('c3_a',cast (null as string) ,'c3_b',cast (null as string))) from z_dummy;
This will work for you.
Needless to say, please upvote if the answer was useful. 🙂
Created 06-12-2017 03:25 PM
When you say 'array(cast (null as string))', aren't we inserting one element, whose value is null into the array? The array size here is 1 and the array itself isn't null. For e.g. when I select the row that was inserted using the above statement, I get the following:
hive> select C2, C3 from complex_null_test;
{"c2_a":null,"c2_c":null} [{"c3_a":null,"c3_b":null}]
Notice in the above that the array itself has 1 element of struct type whose values are null.
Created 06-12-2017 01:48 PM
When you load data the default Missing values are represented by the special value NULL.
For Example: Find the sample data:
1,abc,4000,maths$physics$chemistry,hyd
2,hadoop,3000,maths$physics,bang
3,mapreduce,4000,social,bang
For the above data the table create statement: create table array_table(id int,name string,sal int,sub array<string>, city string) row format delimited fields terminated by ',' collection items terminated by '$';
Load statement:
load data local inpath '/Users/abc/Downloads/arrayfile.txt' into table array_table;
hive> select * from array_table;
OK
1abc40000["maths","physics","chemistry"]hyd
2hadoop3000["maths","physics"]bang
3mapreduce4000["social"]bang
Time taken: 0.086 seconds, Fetched: 3 row(s)
hive> select * from array_table where sub[2] is not null;
OK
1abc40000["maths","physics","chemistry"]hyd
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select * from array_table where sub[1] is not null;
OK
1abc40000["maths","physics","chemistry"]hyd
2hadoop3000["maths","physics"]bang
Time taken: 0.089 seconds, Fetched: 2 row(s)
Is this what you are looking for.? Can you kinldy respond back if it meets your requirement.?
Created 06-12-2017 03:27 PM
How do you load data into the above table, with data for some complex columns missing?
Created 06-12-2017 01:48 PM
When you load data into hive table the default Missing values are represented by the special value NULL.
For Example:
1,abc,40000,maths$physics$chemistry,hyd
2,hadoop,3000,maths$physics,bang
3,mapreduce,4000,social,bang
Find the Below Queries:
hive> create table array_table(id int,name string,sal int,sub array<string>, city string) row format delimited fields terminated by ',' collection items terminated by '$';
OK
Time taken: 0.06 seconds
hive> load data local inpath '/Users/abc/sample.txt' into table array_table;
Loading data to table 7may.array_table
Table 7may.array_table stats: [numFiles=1, totalSize=106]
OK
Time taken: 0.146 seconds
hive> select * from array_table;
OK
1abc40000["maths","physics","chemistry"]hyd
2hadoop3000["maths","physics"]bang
3mapreduce4000["social"]bang
Time taken: 0.121 seconds, Fetched: 3 row(s)
hive> select * from array_table where sub[2] is not null;
OK
1abc40000["maths","physics","chemistry"]hyd
Time taken: 0.091 seconds, Fetched: 1 row(s)
hive> select * from array_table where sub[1] is not null;
OK
1abc40000["maths","physics","chemistry"]hyd
2hadoop3000["maths","physics"]bang
Time taken: 0.089 seconds, Fetched: 2 row(s)
Kindly reply if it is useful for your requirement.