Created on 04-25-2018 11:51 AM - edited 09-16-2022 06:08 AM
For this simple query of a view,
select * from view_name where column_name is not null;
it is returning all records from the view (null and not null) in Impala. The condition is being ignored. Is there a work around for this? I've tried coalesce and nullif tricks and it still doesn't work and I still get all records from the view. The view was created using Hive. Does that make a difference? I don't have this problem when working with tables in Impala.
Created on 04-25-2018 01:00 PM - edited 04-25-2018 01:00 PM
This is expected to work. What's the view definition? Can you post the EXPLAIN plan of your query?
Created 05-04-2018 06:25 AM
Do you want to see the SQL code that created the view? Do you want an expain plan for that or the simple select * from view_name where column_name is not null?
Created 05-07-2018 10:44 PM
Yes to both.
Created 05-08-2018 05:05 AM
create or replace view data_call.view_comp_stat_by_qtr
as
select q.full_quarter,
q.reporting_deadline,
cc.aais_company_number,
c.name as company_name,
l.lob_name_short as LOB,
l.lob_code as line_code,
q.quarter_id,
q.year,
--min(t.src_created_date) as rpt_dt, --Changed to max to account for finishing instead of just starting - mikeg
max(t.src_created_date) as rpt_dt,
sum(t.written_premium) as written_premium,
sum(t.record_count) as record_count,
t.transmittal_status_name as trans_status,
nvl(qtrs.quarters, 0) as quarters_reported,
sum(coalesce(t.paid_loss,0)) as paid_loss,
sum(coalesce(t.outstanding_loss,0)) as outstanding_loss,
sum(coalesce(t.paid_claim_count,0)) as paid_claim_count,
sum(coalesce(t.outstanding_claim_count,0)) as outstanding_claim_count,
sum(coalesce(t.paid_lae,0)) as paid_lae,
sum(coalesce(t.outstanding_lae,0)) as outstanding_lae
from data_call.view_sdma_quarters q
cross join (select distinct company_code as aais_company_number, company_id from prod_integration.dim_company where is_active = 'Y') cc
left join (select t1.*
from prod_integration.dim_transmittal t1
where t1.transmittal_status_name in ('AAIS Accepted', 'AAIS Acknowledged')) t on t.year = q.year and t.quarter_name = q.quarter_name and t.aais_company_number = cc.aais_company_number
left join (select * from prod_integration.dim_company where is_active = 'Y') c on t.aais_company_number = c.company_code and c.company_code = cc.aais_company_number
left join prod_integration.dim_lob l on t.lob_number = l.lob_code
left join (select aais_company_number as company_id, year, lob_number as line_id, count(distinct quarter_name) as quarters from prod_integration.dim_transmittal group by aais_company_number, year, lob_number) qtrs on qtrs.company_id = c.company_code and qtrs.year = t.year and qtrs.line_id = l.lob_code
group by q.full_quarter,
q.reporting_deadline,
cc.aais_company_number,
c.name,
l.lob_name_short,
l.lob_code,
q.quarter_id,
q.year,
t.transmittal_status_name,
nvl(qtrs.quarters, 0);
Explain plan for select * from data_call.view_comp_stat_by_qtr where company_name is not null;
Estimated Per-Host Requirements: Memory=288.00MB VCores=4
WARNING: The following tables are missing relevant table and/or column statistics.
prod_integration.dim_company, prod_integration.dim_lob, prod_integration.dim_transmittal
17:AGGREGATE [FINALIZE]
| output: max(t1.src_created_date), sum(t1.written_premium), sum(t1.record_count), sum(coalesce(t1.paid_loss, 0)), sum(coalesce(t1.outstanding_loss, 0)), sum(coalesce(t1.paid_claim_count, 0)), sum(coalesce(t1.outstanding_claim_count, 0)), sum(coalesce(t1.paid_lae, 0)), sum(coalesce(t1.outstanding_lae, 0))
| group by: concat(CAST(t.year AS STRING), '-', t.quarter_name), date_add(to_date(date_sub(add_months(concat(from_unixtime(unix_timestamp(concat(CAST(t.year AS STRING), '-', (CASE WHEN q1.id = 1 THEN '03' WHEN q1.id = 2 THEN '06' WHEN q1.id = 3 THEN '09' ELSE '12' END), '-01'), 'yyyy-MM-dd'), 'yyyy-MM'), '-01'), 1), 1)), 60), dim_company.company_code, dim_company.name, l.lob_name_short, l.lob_code, q1.id, t.year, t1.transmittal_status_name, nvl(count(dim_transmittal.quarter_name), 0)
|
16:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: t1.year = dim_transmittal.year, dim_company.company_code = dim_transmittal.aais_company_number, l.lob_code = dim_transmittal.lob_number
|
|--27:EXCHANGE [UNPARTITIONED]
| |
| 11:AGGREGATE [FINALIZE]
| | output: count(dim_transmittal.quarter_name)
| | group by: dim_transmittal.aais_company_number, dim_transmittal.year, dim_transmittal.lob_number
| |
| 26:AGGREGATE
| | group by: dim_transmittal.aais_company_number, dim_transmittal.year, dim_transmittal.lob_number, dim_transmittal.quarter_name
| |
| 25:EXCHANGE [HASH(dim_transmittal.aais_company_number,dim_transmittal.year,dim_transmittal.lob_number)]
| |
| 10:AGGREGATE [STREAMING]
| | group by: dim_transmittal.aais_company_number, dim_transmittal.year, dim_transmittal.lob_number, dim_transmittal.quarter_name
| |
| 09:SCAN HDFS [prod_integration.dim_transmittal]
| partitions=1/1 files=1 size=4.52MB
|
15:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: t1.lob_number = l.lob_code
|
|--24:EXCHANGE [UNPARTITIONED]
| |
| 08:SCAN HDFS [prod_integration.dim_lob l]
| partitions=1/1 files=1 size=1.62KB
|
14:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: t1.aais_company_number = dim_company.company_code, dim_company.company_code = dim_company.company_code
|
|--23:EXCHANGE [UNPARTITIONED]
| |
| 07:SCAN HDFS [prod_integration.dim_company]
| partitions=1/1 files=1 size=356.82KB
| predicates: prod_integration.dim_company.name IS NOT NULL, dim_company.is_active = 'Y'
|
13:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
| hash predicates: t.year = t1.year, t.quarter_name = t1.quarter_name, dim_company.company_code = t1.aais_company_number
|
|--22:EXCHANGE [UNPARTITIONED]
| |
| 06:SCAN HDFS [prod_integration.dim_transmittal t1]
| partitions=1/1 files=1 size=4.52MB
| predicates: t1.transmittal_status_name IN ('AAIS Accepted', 'AAIS Acknowledged')
|
12:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]
|
|--21:EXCHANGE [UNPARTITIONED]
| |
| 20:AGGREGATE [FINALIZE]
| | group by: dim_company.company_code, dim_company.company_id
| |
| 19:EXCHANGE [HASH(dim_company.company_code,dim_company.company_id)]
| |
| 05:AGGREGATE [STREAMING]
| | group by: dim_company.company_code, dim_company.company_id
| |
| 04:SCAN HDFS [prod_integration.dim_company]
| partitions=1/1 files=1 size=356.82KB
| predicates: dim_company.is_active = 'Y'
|
03:AGGREGATE [FINALIZE]
| group by: t.year, id, t.quarter_name, concat(CAST(t.year AS STRING), '-', t.quarter_name), date_add(to_date(date_sub(add_months(concat(from_unixtime(unix_timestamp(concat(CAST(t.year AS STRING), '-', (CASE WHEN id = 1 THEN '03' WHEN id = 2 THEN '06' WHEN id = 3 THEN '09' ELSE '12' END), '-01'), 'yyyy-MM-dd'), 'yyyy-MM'), '-01'), 1), 1)), 60), to_date(concat(CAST(t.year AS STRING), '-', (CASE WHEN id = 1 THEN '03' WHEN id = 2 THEN '06' WHEN id = 3 THEN '09' ELSE '12' END), '-01'))
|
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: name = t.quarter_name
|
|--18:EXCHANGE [UNPARTITIONED]
| |
| 00:SCAN HDFS [prod_integration.dim_transmittal t]
| partitions=1/1 files=1 size=4.52MB
|
01:UNION
constant-operands=4
Created 05-10-2018 02:49 PM
The IS NOT NULL predicate is not placed correcly in the execution plan, that's why the result are wrong. We've fied many similar issues, chances are this one is known and fixed. I will need to confirm. What version of Impala are you running?
You might be able to work around the issue like this.
Instead of "where company_name is not null" use "where coalesce(company_name, aais_company_number) is not null".
You might need to "cast(aais_company_number as string)"
I understand it's not pretty but hopefully it does the trick.
Created 05-11-2018 05:07 AM
"where coalesce(company_name, aais_company_number) is not null" produces the same result because aais_company_number is never null so I still get all the records.
We are using Hue 3.10 and I don't know which version of Impala comes with that. I just started working in this environment recently.
Created 09-21-2022 10:50 PM
Try using length function where length(company name ) >1