Created on 12-16-2020 04:52 AM - edited 12-16-2020 04:53 AM
Hello Team,
We are using Impala 3.2.0 there are some queries which were working 3-4 weeks back and now they have suddenly started throwing "IllegalStateException: null" exception. In last 3-4 nothing has changed on our Impala cluster. Error logs also does not suggest the cause of the error.
Can anybody help us as to why this issue arising.
Query which failing looks something like this
WITH banner_test AS (WITH layout_id AS
(
SELECT <some columns>
FROM table_v1
WHERE
( layout_component_id = <some_id>)
),
layouts AS
(SELECT <some columns>,
coalesce() AS col1,
<some columns>
FROM table_v2 lch
WHERE lch.some_id IN
(SELECT some_id
FROM table_v3
WHERE some_field_v1 IN
(SELECT some_field from layout_id))
AND lch.start_time >= '2019-01-01T00:00:00.000+00:00'
),
full_data AS
(SELECT lctd.some_field_v1,
l.*,
lctd.some_field_v2,
min(l.start_time)over(PARTITION BY some_field_v1) AS min_start_time,
max(l.end_time)over(PARTITION BY some_field_v1) AS max_end_time
FROM tables_v4 l
INNER JOIN tables_v5 lctd ON l.some_id = lctd.some_id),
tables_v5 AS
(
SELECT <some_fields>,
case when some_field like 'val1%' then 'val1' else 'val2' end as business
FROM tables_v6
WHERE date_str >= '2019-01-01'
)
SELECT <some_fields>,
case when x3.some_field=x2.some_field then x3.some_field end as some_field,
x3.some_id as x3.some_id
FROM
(SELECT DISTINCT some_filed,
more_fields
FROM tables_v7 sve
INNER JOIN tables_v8 lfd ON sve.field1 = lfd.field2
AND sve.type = sve.type
AND coalesce(sve.field,"somestring") = lfd.field
AND sve.created_at BETWEEN lfd.min_start_time AND coalesce(lfd.max_end_time,'2030-12-31T00:00:00.000+00:00')
INNER JOIN tables_v9 aau ON sve.some_id = aau.some_id
WHERE aau.some_id IS NOT NULL) x1
INNER JOIN tables_v10 ild on ild.hash_value=x1.hash_value
LEFT JOIN
(SELECT DISTINCT some_filed,
<some_fields>
FROM tables_v10 sve
INNER JOIN full_data lfd ON sve.some_id = lfd.some_id)
x2 ON x1.some_id_1 = x2.some_id_1
AND x1.created_at < x2.created_at
and ild.lcid = x2.id
LEFT JOIN
(SELECT <some_fileds>
FROM table_v11 where status='SUCCESS')
x3 ON x2._id = x3._id
AND x2.created_at < x3.checked_out_at )
SELECT
field_1,
field_2,
field_3,
TO_DATE(FROM_UTC_TIMESTAMP(started_date ,'Asia/Bangkok')) AS started_date_1,
TO_DATE(FROM_UTC_TIMESTAMP(ended_date ,'Asia/Bangkok')) AS ended_date_1,
(((DATEDIFF(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok'), '1970-01-04')%7 + 7)%7 - 1 + 7)%(7)) AS ended_day_of_week_index_1,
DAYNAME(FROM_UTC_TIMESTAMP(banner_ended ,'Asia/Bangkok')) AS _ended_day_of_week_1,
_present_on_screen_id AS _test_screen_id_1,
count(distinct _test.x2_session_id)/count(distinct _test.x1_session_id) AS test_ctr_sessions_1,
count(distinct _test.x2_anonymous_id)/count(distinct _test.x1_anonymous_id) AS test_ctr_aid_1
FROM banner_test
WHERE ((_test.banner_started >= TO_UTC_TIMESTAMP('1900-01-01 00:00:00.000000','Asia/Bangkok'))) AND (banner_test.region REGEXP '$') AND (banner_test.business REGEXP '$')
GROUP BY 1,2,3,4,5,6,7,8
ORDER BY _banner_started_date_1 DESC
LIMIT 500;
I have obfuscated some of the details here but information regarding joins, aggregations, group by etc is preserved.
Regards
Parth
Created 12-18-2020 01:46 PM
It looks like most likely a bug in the Impala planner with some of the estimated stats being calculated as negative and it triggering that error. I haven't seen exactly this symptom before, but I think it's most likely caused by https://issues.apache.org/jira/browse/IMPALA-7604.
This could cause it if the #rows estimate from the aggregation (GROUP BY) before the sort (ORDER BY) overflows and wraps around to being negative. For this to happen, the product of the distinct value count from the columns in the group by would have to be > 2^63. I.e. if you have GROUP BY a, b, c and each column has 10 distinct values, you would get a product of 10 * 10 * 10 = 1000.
It's possible that the stats changed somehow and before it wasn't being triggered.
CDH6.3.4 is the earliest release with a fix for it. You could work around by dropping stats on one of the tables, but that can have some pretty bad performance implications. You might also be able to work around by tweaking the query, e.g. grouping by different columns.
Created 12-16-2020 10:17 AM
Are you sure there isn't a stack trace for IllegalStateException in the impala daemon logs or catalog daemon logs? That would help match it to a bug.
Created 12-16-2020 10:28 AM
@Tim Armstrong This is the stacktrace which I got from the impala demon INFO logs
I1216 18:24:31.520996 5823 jni-util.cc:256] 5348abd63d19b026:621ffa8a00000000] java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.impala.planner.SortNode.computeNodeResourceProfile(SortNode.java:258)
at org.apache.impala.planner.PlanFragment.computeResourceProfile(PlanFragment.java:234)
at org.apache.impala.planner.Planner.computeResourceReqs(Planner.java:388)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1178)
at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1466)
at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1345)
at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252)
at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222)
at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167)
I1216 18:24:31.521031 5823 status.cc:124] 5348abd63d19b026:621ffa8a00000000] IllegalStateException: null
Parth
Created 12-18-2020 09:43 AM
@Tim Armstrong Did the logs help?
Created 12-18-2020 01:46 PM
It looks like most likely a bug in the Impala planner with some of the estimated stats being calculated as negative and it triggering that error. I haven't seen exactly this symptom before, but I think it's most likely caused by https://issues.apache.org/jira/browse/IMPALA-7604.
This could cause it if the #rows estimate from the aggregation (GROUP BY) before the sort (ORDER BY) overflows and wraps around to being negative. For this to happen, the product of the distinct value count from the columns in the group by would have to be > 2^63. I.e. if you have GROUP BY a, b, c and each column has 10 distinct values, you would get a product of 10 * 10 * 10 = 1000.
It's possible that the stats changed somehow and before it wasn't being triggered.
CDH6.3.4 is the earliest release with a fix for it. You could work around by dropping stats on one of the tables, but that can have some pretty bad performance implications. You might also be able to work around by tweaking the query, e.g. grouping by different columns.
Created 12-19-2020 07:39 AM
@Tim Armstrong CDH 6.3.4 packaging information suggests the Impala is still at 3.2.0 .
Am I looking at wrong page? We are eagerly waiting for Impala 3.3.0 & 3.4.0 to available in CDH package since September 2019 as they have very good features like Ranger Support, caching for remote filesystems liike S3.
Is there a way to run the Impala 3.4.0 with cloudera manager 6.3.1 and by passing the Impala which is provided by CDH 6.3.2?
Created 12-21-2020 09:13 AM
The versions of Apache Impala in Cloudera are "based on" Apache Impala releases but will often include substantial additional features and fixes. We do a lot of work beyond just repackaging the upstream releases. Please consult the Cloudera release notes for the version of CDH/CDP you're using if you want to understand what features are present. Just looking at the version string won't give you that information.
CDH 6.3.4 includes that fix. See https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_634_fixed_issues.ht...
Most new features have been going into CDP only - we have been rolling out new Impala features in CDP public cloud on a fairly continual basis and then these are trickling down into CDP private cloud releases.
Some limited features have gone into minor CDH release - i.e. 6.3.0, but we've generally been prioritizing stability there and focusing on CDP. We shipped a preview version of the remote read cache in CDH 6.3.0 - https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_630_new_features.ht... . Ranger support is not in CDH - the general idea there is to allow migrating from Sentry to Ranger as part of the CDH->CDP upgrade process.
Created 12-22-2020 06:24 AM
@Tim Armstrong Thanks for helping out here.
My apologies for mis-understanding w.r.t packing information.