Created on 11-06-2018 06:48 AM - edited 09-16-2022 06:52 AM
Today we've faced with the following situation:
different Impala queries to the same table returned inconsistent results - one showed there are data, another showed their absence.
There were no data modifications in between and the queries were executed several times shuffling the order.
The table is stored as Kudu.
The shown results are from impala-shell.
Environment:
CDH 5.15.0
Kudu 1.7.0-cdh5.15.0 (3 masters + 16 tservers)
Impala-shell v2.12.0-cdh5.15.0
Query1.
[node009.mydomain.net:21000] > select * from mydb1.table1 limit 20 ; Query: select * from mydb1.table1 limit 20 Query submitted at: 2018-11-06 13:30:19 (Coordinator: http://node009:25000) Query progress can be monitored at: http://node009:25000/query_plan?query_id=d84287045f155a50:251e230100000000 +---------+------------+---------------+----------+-----------+----------+ | myfield1| myfield2 | myfield3 | myfield4 | myfield5 | myfield6 | +---------+------------+---------------+----------+-----------+----------+ | 19 | 0 | 1279900254208 | z0012 | 22 | M | ... | 302 | 0 | 1194001234293 | c1236 | 3 | A | +---------+------------+---------------+----------+-----------+----------+ Fetched 20 row(s) in 21.13s
Inspecting tablets at Kudu with "kudu fs list" shows multiple rowsets with data.
Query2.
[node009.mydomain.net:21000] > select count(*) from mydb1.table1 ; Query: select count(*) from mydb1.table1 Query submitted at: 2018-11-06 13:32:56 (Coordinator: http://node009:25000) Query progress can be monitored at: http://node009:25000/query_plan?query_id=874eb1365115b065:5527220e00000000 +----------+ | count(*) | +----------+ | 0 | +----------+ Fetched 1 row(s) in 34.71s
Might be worth to mention that query time is quite large as in case there are data.
Created on 11-06-2018 05:05 PM - edited 11-06-2018 05:08 PM
Hi Andreyeff,
You mentioned there were 'no data modifications in between' the queries, but do you verify if any ingesting (that hasn't finished) is still going on while you issued these queries? Since Impala is using READ_LATEST scan mode, it is possible that the scan has been taken placed in a stale replica. Also, did you run `ksck` tool to check if the cluster is in a healthy state?
Best,
Hao
Created 11-07-2018 12:56 AM
These queries were executed multiple times in order something like "query1", "query2", "query1", "query2",... . For about half an hour.
Kudu cluster ksck was checked as well and it was healthy.
Created 11-07-2018 09:10 AM
Hi Andreyeff,
Does 'query1' always return consisten resutls? The same for 'quer2'? If you do 'query1', 'query2' now, will it return consistent results?
Best,
Hao
Created on 11-07-2018 09:38 AM - edited 11-07-2018 09:43 AM
Hi Andreyeff,
It's possible this is an instance of KUDU-2463. Does the the following description match to your case:
Are you actively writing to the table regularly? Based on the schema, are they writing to every tablet? If they are, that is evidence against the issue being KUDU-2463.
When was the last time you restarted a tablet server? If the incorrect results were only noticed after a restart, that is evidence for the issue being KUDU-2463.
Best,
Hao
Created 11-14-2018 07:17 AM
Yes, 'query1' always returned data, but the 'query2' always returned empty output.
Did you mean by "now" - after some time, during which some inserts and timestamp move was done? Unfortunately, we've recreated the table after some time and I can't say if the writing helped.
Thanks for the link to ticket KUDU-2463.
1. There were inserts previous evening - the table was populated. Since then unlikely there were changes.
2. I had a look into logs: all tservers were restarted a few hours before the select queries.
So far there are more evidences for this Kudu-2463. If it appears again, I will try to write to the table.
I've checked with developers, so summing up with your comments:
previous day evening there was deletion (all records, but table remained) and insertion of data (via spark).
Next morning tservers were restarted.
After ~8h or more after insert has finished the inconsistency was found.
Created on 11-16-2018 12:01 PM - edited 11-16-2018 01:45 PM
So it looks like you have already fixed the table? For a longer term solution, I suggest you to upgrade to CDH versions that have the fix for KUDU-2463, which are 5.15.2, 5.16.2, 6.1.0, when they are available.
Created 12-12-2018 06:10 AM
Do you have any information when 5.15.2/5.16.2 version will be released?
Thank you