Created on 05-10-2018 07:08 AM - edited 09-16-2022 06:12 AM
Query:
SELECT
SUM(fpe.measure_EncounterInstanceCount) AS TOTAL_INSTANCE_COUNT
FROM
fact_PatientEncounter fpe
inner join
(
SELECT DISTINCT
CC1.DispatchedDate_Encoded,
CC1.FactPatientEncounter_HashKey
FROM
dim_PatientEncounter_CategorizedCollections cc1
WHERE
((cc1.dispatcheddate_encoded >= 20170101) and (cc1.dispatcheddate_encoded <= 20180417))
AND cc1.Collection = 'TEST'
AND ((CC1.Value = 'val1') OR (CC1.Value = 'val2'))
) M
ON fPE.DispatchedDate_Encoded = M.DispatchedDate_Encoded
AND fPE.FactPatientEncounter_HashKey = M.FactPatientEncounter_HashKey
INNER JOIN
(
SELECT DISTINCT
cc2.DispatchedDate_Encoded,
cc2.FactPatientEncounter_HashKey
FROM
dim_PatientEncounter_CategorizedCollections cc2
WHERE
((cc2.dispatcheddate_encoded >= 20170101) and (cc2.dispatcheddate_encoded <= 20180417))
AND cc2.Collection = 'TEST'
AND ((cc2.Value = 'val3') OR (cc2.Value = 'val4'))
) N
ON fpe.DispatchedDate_Encoded = N.DispatchedDate_Encoded
AND fpe.FactPatientEncounter_HashKey = N.FactPatientEncounter_HashKey
where
((fpe.RecordStatusDeleted IS NULL) and (fpe.RecordStatus IS NULL))
AND ((fpe.dispatcheddate_encoded >= 20170101) and (fpe.dispatcheddate_encoded <= 20180417))
Returns nulls. The same query with the same dataset returns data fine on SQL Server. Also, each component of the query if run separately also works fine. Is there any Impala/kudu bugs with joining the same table multiple times?
Thanks.
Created 05-10-2018 09:53 AM
Hi,
Is it possible to reproduce this with a smaller sample data set that you can share? I'm not aware of any such bugs but it's possible you've discovered something new.
-Todd
Created 05-10-2018 11:57 AM
Might be a known Impala planner bug. What version of Impala are you running? Can you share the EXPLAIN output?
Created 05-10-2018 12:00 PM
Created 05-10-2018 02:36 PM
Thanks. The plan looks good to me. I'm not aware of any issue like this one.
Would you be able to help us debug the query? This might be helpful:
- Are the results deterministically wrong or are they non-deterministic?
- Do you think column value sare wrong or are we producing an incorrect number of rows somewhere?
- Looking at the rows produced by different operators in the SUMMARY of the query profile might help pinpoint where things are going wrong
- Try running the query with straight_join and broadcast hints and see if the query results change. Like this:
SELECT /* +straight_join */ SUM(...)
FROM fpe
INNER JOIN /* +broadcast */ (SELECT DISTINCT ...) M ON (...)
INNER JOIN /* +broadcast */ (SELECT DISTINCT ...) N ON (...)
WHERE ...
Created 05-11-2018 07:20 AM
Adding the Broadcast query hint did not help, still get null. Thanks.
Created 05-15-2018 11:45 AM
Since Kudu does not support ACID transactions I wonder if it's possible you could be hitting issues with data consistency? Is the fact_PatientEncounter table actively receiving writes? If so, I wonder if fpe.measure_EncounterInstanceCount could be NULL for some row while fpe.RecordStatusDeleted or fpe.RecordStatus is also NULL (even for a short period of time).
Created 05-29-2019 09:05 PM
Please check the query again.
Subqueries M and N have the exact same filters except that in M you filter for Value = 'val1' or 'val2' and in N you filter for Value = 'val3' or 'val4'.
So, this would make the the rows returned by M and N mutually exclusive, hence NULL when you do an inner join between them.