Support Questions

Find answers, ask questions, and share your expertise

[KUDU] Does impala scan (READ_LATEST mode) inconsistency only arise during leader change?

avatar
Explorer

When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong num. Then I execute the same sql again, the total count is correct.

 

I want to know besides leader change, is there have any other internal ops can cause the scan inconsistency? If I change the impala configure kudu_read_mode: READ_LATEST to kudu_read_mode: READ_AT_SNAPSHOT,  what's the timestamp that the impala will transimit? If the READ_AT_SNAPSHOT can resolve the issue?

 

I am using the impala 2.10.0 + kudu 1.5.0.

1 ACCEPTED SOLUTION

avatar
Rising Star
1. The default in Impala appears to be CLOSEST_REPLICA. As I understand it,
this means that the Kudu scanners used by Impala will try to use replicas
that are on the same node as the scanner if any such replicas exist, and
select replicas randomly otherwise. Perhaps someone with more context on
Impala can chime in here.

2. Right, e.g. if leader A gets network partitioned from the other nodes
{B, C, D} such that the other nodes are able to elect a new leader, and at
the same time, a client uses its cached leader A and scans A in LEADER_ONLY
mode, then the results will be stale in comparison to any progress made by
{B, C, D}. There may be more scenarios that might demonstrate what you're
asking about, but I think this one fits.

View solution in original post

4 REPLIES 4

avatar
Rising Star
Leadership changes don't necessarily cause inconsistent results.
READ_LATEST doesn't guarantee consistency because when a scan gets sent to
a replica (not necessarily the leader), that replica will respond with the
latest data it has available (rather than at a specific timestamp). If that
replica is being caught up or is behind in terms replication for some
reason, this will be a stale result.

For more info about Impala's READ_AT_SNAPSHOT behavior, check out
IMPALA-3788 . From one
of the comments: "the client will pick a ts that is after the latest
observed session ts (propagated and set with SetLatestObservedTimestamp()
by the previous commit for IMPALA-3788) and perform a snapshot read at that
time." It seems this is still somewhat experimental, but at least should
provide some reasonable guarantees (i.e. "read-your-writes").

avatar
Explorer

Hi awong,

 

READ_LATEST doesn't guarantee consistency because when a scan gets sent to a replica (not necessarily the leader), that replica will respond with the latest data it has available (rather than at a specific timestamp). If that replica is being caught up or is behind in terms replication for some reason, this will be a stale result.

 

1. You mean the impala 2.10 always choose the ReplicaSelection.CLOSEST_REPLICA to build the scanner ? because I only use impala to insert and select.

2. If the scanner choose ReplicaSelection.LEADER_ONLY  and  READ_LATEST to build, only the leadership change will cause the scan inconsistency?

 

 Best regards,

Tony

 

avatar
Rising Star
1. The default in Impala appears to be CLOSEST_REPLICA. As I understand it,
this means that the Kudu scanners used by Impala will try to use replicas
that are on the same node as the scanner if any such replicas exist, and
select replicas randomly otherwise. Perhaps someone with more context on
Impala can chime in here.

2. Right, e.g. if leader A gets network partitioned from the other nodes
{B, C, D} such that the other nodes are able to elect a new leader, and at
the same time, a client uses its cached leader A and scans A in LEADER_ONLY
mode, then the results will be stale in comparison to any progress made by
{B, C, D}. There may be more scenarios that might demonstrate what you're
asking about, but I think this one fits.

avatar
Explorer

@awong,Thanks for your quick reply.