When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong num. Then I execute the same sql again, the total count is correct.
I want to know besides leader change, is there have any other internal ops can cause the scan inconsistency? If I change the impala configure kudu_read_mode: READ_LATEST to kudu_read_mode: READ_AT_SNAPSHOT, what's the timestamp that the impala will transimit? If the READ_AT_SNAPSHOT can resolve the issue?
I am using the impala 2.10.0 + kudu 1.5.0.
READ_LATEST doesn't guarantee consistency because when a scan gets sent to a replica (not necessarily the leader), that replica will respond with the latest data it has available (rather than at a specific timestamp). If that replica is being caught up or is behind in terms replication for some reason, this will be a stale result.
1. You mean the impala 2.10 always choose the ReplicaSelection.CLOSEST_REPLICA to build the scanner ? because I only use impala to insert and select.
2. If the scanner choose ReplicaSelection.LEADER_ONLY and READ_LATEST to build, only the leadership change will cause the scan inconsistency?