Reply
Explorer
Posts: 21
Registered: ‎05-03-2018

Inconsistent query result from a Kudu table

Today we've faced with the following situation:

different Impala queries to the same table returned inconsistent results - one showed there are data, another showed their absence.

There were no data modifications in between and the queries were executed several times shuffling the order.

 

The table is stored as Kudu.

The shown results are from impala-shell.

 

Environment:

CDH 5.15.0

Kudu 1.7.0-cdh5.15.0 (3 masters + 16 tservers)

Impala-shell v2.12.0-cdh5.15.0

 

Query1. 

[node009.mydomain.net:21000] > select * from mydb1.table1 limit 20 ;
Query: select * from mydb1.table1 limit 20
Query submitted at: 2018-11-06 13:30:19 (Coordinator: http://node009:25000)
Query progress can be monitored at: http://node009:25000/query_plan?query_id=d84287045f155a50:251e230100000000
+---------+------------+---------------+----------+-----------+----------+
| myfield1| myfield2   | myfield3      | myfield4 | myfield5  | myfield6 |
+---------+------------+---------------+----------+-----------+----------+
| 19      | 0          | 1279900254208 | z0012    | 22        | M        |
...
| 302     | 0          | 1194001234293 | c1236    | 3         | A        |
+---------+------------+---------------+----------+-----------+----------+
Fetched 20 row(s) in 21.13s

Inspecting tablets at Kudu with "kudu fs list" shows multiple rowsets with data.

 

 

Query2.  

[node009.mydomain.net:21000] > select count(*) from mydb1.table1 ;
Query: select count(*) from mydb1.table1
Query submitted at: 2018-11-06 13:32:56 (Coordinator: http://node009:25000)
Query progress can be monitored at: http://node009:25000/query_plan?query_id=874eb1365115b065:5527220e00000000
+----------+
| count(*) |
+----------+
| 0        |
+----------+
Fetched 1 row(s) in 34.71s

Might be worth to mention that query time is quite large as in case there are data.

Cloudera Employee
Posts: 17
Registered: ‎02-22-2017

Re: Inconsistent query result from a Kudu table

[ Edited ]

Hi Andreyeff,

 

You mentioned there were 'no data modifications in between' the queries, but do you verify if any ingesting (that hasn't finished) is still going on while you issued these queries? Since Impala is using READ_LATEST scan mode, it is possible that the scan has been taken placed in a stale replica. Also, did you run `ksck` tool to check if the cluster is in a healthy state?

 

Best,

Hao

Explorer
Posts: 21
Registered: ‎05-03-2018

Re: Inconsistent query result from a Kudu table

These queries were executed multiple times in order something like "query1", "query2", "query1", "query2",... . For about half an hour.

Kudu cluster ksck was checked as well and it was healthy.

Cloudera Employee
Posts: 17
Registered: ‎02-22-2017

Re: Inconsistent query result from a Kudu table

Hi Andreyeff,

 

Does 'query1' always return consisten resutls? The same for 'quer2'? If you do 'query1', 'query2' now, will it return consistent results?

 

Best,

Hao

Highlighted
Cloudera Employee
Posts: 17
Registered: ‎02-22-2017

Re: Inconsistent query result from a Kudu table

[ Edited ]

Hi Andreyeff,

 

It's possible this is an instance of KUDU-2463. Does the the following description match to your case:

  1. Are you actively writing to the table regularly? Based on the schema, are they writing to every tablet? If they are, that is evidence against the issue being KUDU-2463.

  2. When was the last time you restarted a tablet server? If the incorrect results were only noticed after a restart, that is evidence for the issue being KUDU-2463.

Best,

Hao

Announcements