Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

IllegalStateException during insert into table

avatar

Hi,

 I am inserting a result of simple aggregation and my query fails when the result of the aggregation is an empty record set. The results are inserted into a partitioned table. The insert fails with this message:

 

ERROR: IllegalStateException: null.

 

The query which should prepare the data is containing aggregations. I thought the error is because the query does not return any row, but if I change the subquery from GROUP by to a simple select, then it works, and Inserts 0 rows.


create table tmp.tab (
file_name string,
rc int
) partitioned by ( day_id int ) stored as parquet;

 

 

-- this fails:

insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;

 

-- this works:

insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;

 

Using Impala  2.2.0-cdh5.4.8 

Tomas

 

1 ACCEPTED SOLUTION

avatar

Update:

 

the explain function on the problematic query fails:

 

explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
ERROR: IllegalStateException: null

but the explain on the working query suceeds:

explain insert into tmp.tab partition (day_id)
                  > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
                  > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| WARNING: The following tables are missing relevant table and/or column statistics.                                       |
| tmp.tab                                                                                                                  |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 8 row(s) in 0.03s

I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab

 

This helped, so the problem is solved!

 

Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0                                                                      |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 01:AGGREGATE [FINALIZE]                                                                                                  |
| |  output: count(*), max(file_name)                                                                                      |
| |  group by: file_name                                                                                                   |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 11 row(s) in 0.03s

View solution in original post

1 REPLY 1

avatar

Update:

 

the explain function on the problematic query fails:

 

explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
ERROR: IllegalStateException: null

but the explain on the working query suceeds:

explain insert into tmp.tab partition (day_id)
                  > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
                  > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| WARNING: The following tables are missing relevant table and/or column statistics.                                       |
| tmp.tab                                                                                                                  |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 8 row(s) in 0.03s

I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab

 

This helped, so the problem is solved!

 

Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0                                                                      |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 01:AGGREGATE [FINALIZE]                                                                                                  |
| |  output: count(*), max(file_name)                                                                                      |
| |  group by: file_name                                                                                                   |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 11 row(s) in 0.03s