Created on 12-01-2017 12:54 AM - edited 09-16-2022 05:35 AM
When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong num. Then I execute the same sql again, the total count is correct.
I want to know besides leader change, is there have any other internal ops can cause the scan inconsistency? If I change the impala configure kudu_read_mode: READ_LATEST to kudu_read_mode: READ_AT_SNAPSHOT, what's the timestamp that the impala will transimit? If the READ_AT_SNAPSHOT can resolve the issue?
I am using the impala 2.10.0 + kudu 1.5.0.
Created 12-13-2017 06:48 PM
Created 12-07-2017 10:57 AM
Created on 12-13-2017 05:27 PM - edited 12-13-2017 05:28 PM
Hi awong,
READ_LATEST doesn't guarantee consistency because when a scan gets sent to a replica (not necessarily the leader), that replica will respond with the latest data it has available (rather than at a specific timestamp). If that replica is being caught up or is behind in terms replication for some reason, this will be a stale result.
1. You mean the impala 2.10 always choose the ReplicaSelection.CLOSEST_REPLICA to build the scanner ? because I only use impala to insert and select.
2. If the scanner choose ReplicaSelection.LEADER_ONLY and READ_LATEST to build, only the leadership change will cause the scan inconsistency?
Best regards,
Tony
Created 12-13-2017 06:48 PM
Created on 12-14-2017 05:20 PM - edited 12-14-2017 05:22 PM
@awong,Thanks for your quick reply.