Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

IllegalStateException during insert into table

avatar

Hi,

 I am inserting a result of simple aggregation and my query fails when the result of the aggregation is an empty record set. The results are inserted into a partitioned table. The insert fails with this message:

 

ERROR: IllegalStateException: null.

 

The query which should prepare the data is containing aggregations. I thought the error is because the query does not return any row, but if I change the subquery from GROUP by to a simple select, then it works, and Inserts 0 rows.


create table tmp.tab (
file_name string,
rc int
) partitioned by ( day_id int ) stored as parquet;

 

 

-- this fails:

insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;

 

-- this works:

insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;

 

Using Impala  2.2.0-cdh5.4.8 

Tomas

 

1 ACCEPTED SOLUTION

avatar

Update:

 

the explain function on the problematic query fails:

 

explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
ERROR: IllegalStateException: null

but the explain on the working query suceeds:

explain insert into tmp.tab partition (day_id)
                  > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
                  > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| WARNING: The following tables are missing relevant table and/or column statistics.                                       |
| tmp.tab                                                                                                                  |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 8 row(s) in 0.03s

I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab

 

This helped, so the problem is solved!

 

Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0                                                                      |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 01:AGGREGATE [FINALIZE]                                                                                                  |
| |  output: count(*), max(file_name)                                                                                      |
| |  group by: file_name                                                                                                   |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 11 row(s) in 0.03s

View solution in original post

1 REPLY 1

avatar

Update:

 

the explain function on the problematic query fails:

 

explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
ERROR: IllegalStateException: null

but the explain on the working query suceeds:

explain insert into tmp.tab partition (day_id)
                  > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
                  > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;
Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| WARNING: The following tables are missing relevant table and/or column statistics.                                       |
| tmp.tab                                                                                                                  |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 8 row(s) in 0.03s

I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab

 

This helped, so the problem is solved!

 

Query: explain insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t
+--------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0                                                                      |
|                                                                                                                          |
| WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] |
| |  partitions=1                                                                                                          |
| |                                                                                                                        |
| 01:AGGREGATE [FINALIZE]                                                                                                  |
| |  output: count(*), max(file_name)                                                                                      |
| |  group by: file_name                                                                                                   |
| |                                                                                                                        |
| 00:SCAN HDFS [tmp.tab]                                                                                                   |
|    partitions=0/0 files=0 size=0B                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------+
Fetched 11 row(s) in 0.03s