Created on 10-04-2016 02:50 AM - edited 09-16-2022 03:43 AM
Hi,
I am inserting a result of simple aggregation and my query fails when the result of the aggregation is an empty record set. The results are inserted into a partitioned table. The insert fails with this message:
ERROR: IllegalStateException: null.
The query which should prepare the data is containing aggregations. I thought the error is because the query does not return any row, but if I change the subquery from GROUP by to a simple select, then it works, and Inserts 0 rows.
create table tmp.tab (
file_name string,
rc int
) partitioned by ( day_id int ) stored as parquet;
-- this fails:
insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t;
-- this works:
insert into tmp.tab partition (day_id)
select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID
from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t;
Using Impala 2.2.0-cdh5.4.8
Tomas
Created 10-05-2016 02:34 AM
Update:
the explain function on the problematic query fails:
explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t; Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t ERROR: IllegalStateException: null
but the explain on the working query suceeds:
explain insert into tmp.tab partition (day_id) > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t; Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t +--------------------------------------------------------------------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------------------------------------------------------------------+ | WARNING: The following tables are missing relevant table and/or column statistics. | | tmp.tab | | | | WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] | | | partitions=1 | | | | | 00:SCAN HDFS [tmp.tab] | | partitions=0/0 files=0 size=0B | +--------------------------------------------------------------------------------------------------------------------------+ Fetched 8 row(s) in 0.03s
I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab
This helped, so the problem is solved!
Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t +--------------------------------------------------------------------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] | | | partitions=1 | | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*), max(file_name) | | | group by: file_name | | | | | 00:SCAN HDFS [tmp.tab] | | partitions=0/0 files=0 size=0B | +--------------------------------------------------------------------------------------------------------------------------+ Fetched 11 row(s) in 0.03s
Created 10-05-2016 02:34 AM
Update:
the explain function on the problematic query fails:
explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t; Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t ERROR: IllegalStateException: null
but the explain on the working query suceeds:
explain insert into tmp.tab partition (day_id) > select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID > from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t; Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select rc as ROW_COUNT, file_name as fn from tmp.tab ) t +--------------------------------------------------------------------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------------------------------------------------------------------+ | WARNING: The following tables are missing relevant table and/or column statistics. | | tmp.tab | | | | WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] | | | partitions=1 | | | | | 00:SCAN HDFS [tmp.tab] | | partitions=0/0 files=0 size=0B | +--------------------------------------------------------------------------------------------------------------------------+ Fetched 8 row(s) in 0.03s
I noticed the warning about missing stats on empty table, so I did compute stats on tm,p.tab
This helped, so the problem is solved!
Query: explain insert into tmp.tab partition (day_id) select t.fn, cast( t.ROW_COUNT as int ), cast(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') as int) as DAY_ID from ( select count(*) as row_count, max( file_name ) as fn from tmp.tab group by file_name ) t +--------------------------------------------------------------------------------------------------------------------------+ | Explain String | +--------------------------------------------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | WRITE TO HDFS [tmp.tab, OVERWRITE=false, PARTITION-KEYS=(CAST(from_unixtime(unix_timestamp(now()), 'yyyyMMdd') AS INT))] | | | partitions=1 | | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*), max(file_name) | | | group by: file_name | | | | | 00:SCAN HDFS [tmp.tab] | | partitions=0/0 files=0 size=0B | +--------------------------------------------------------------------------------------------------------------------------+ Fetched 11 row(s) in 0.03s