Support Questions

tony12 · ‎12-01-2017

When I try to use impala to transfer massive data (about 100G) for one time and select count(1) immediately, I get the wrong num. Then I execute the same sql again, the total count is correct.

I want to know besides leader change, is there have any other internal ops can cause the scan inconsistency? If I change the impala configure kudu_read_mode: READ_LATEST to kudu_read_mode: READ_AT_SNAPSHOT, what's the timestamp that the impala will transimit? If the READ_AT_SNAPSHOT can resolve the issue?

I am using the impala 2.10.0 + kudu 1.5.0.

awong · ‎12-13-2017

1. The default in Impala appears to be CLOSEST_REPLICA. As I understand it,
this means that the Kudu scanners used by Impala will try to use replicas
that are on the same node as the scanner if any such replicas exist, and
select replicas randomly otherwise. Perhaps someone with more context on
Impala can chime in here.

2. Right, e.g. if leader A gets network partitioned from the other nodes
{B, C, D} such that the other nodes are able to elect a new leader, and at
the same time, a client uses its cached leader A and scans A in LEADER_ONLY
mode, then the results will be stale in comparison to any progress made by
{B, C, D}. There may be more scenarios that might demonstrate what you're
asking about, but I think this one fits.

View solution in original post

awong · ‎12-07-2017

Leadership changes don't necessarily cause inconsistent results.
READ_LATEST doesn't guarantee consistency because when a scan gets sent to
a replica (not necessarily the leader), that replica will respond with the
latest data it has available (rather than at a specific timestamp). If that
replica is being caught up or is behind in terms replication for some
reason, this will be a stale result.

For more info about Impala's READ_AT_SNAPSHOT behavior, check out
IMPALA-3788 . From one
of the comments: "the client will pick a ts that is after the latest
observed session ts (propagated and set with SetLatestObservedTimestamp()
by the previous commit for IMPALA-3788) and perform a snapshot read at that
time." It seems this is still somewhat experimental, but at least should
provide some reasonable guarantees (i.e. "read-your-writes").

tony12 · ‎12-13-2017

Hi awong,

READ_LATEST doesn't guarantee consistency because when a scan gets sent to a replica (not necessarily the leader), that replica will respond with the latest data it has available (rather than at a specific timestamp). If that replica is being caught up or is behind in terms replication for some reason, this will be a stale result.

1. You mean the impala 2.10 always choose the ReplicaSelection.CLOSEST_REPLICA to build the scanner ? because I only use impala to insert and select.

2. If the scanner choose ReplicaSelection.LEADER_ONLY and READ_LATEST to build, only the leadership change will cause the scan inconsistency?

Best regards,

Tony

awong · ‎12-13-2017

1. The default in Impala appears to be CLOSEST_REPLICA. As I understand it,
this means that the Kudu scanners used by Impala will try to use replicas
that are on the same node as the scanner if any such replicas exist, and
select replicas randomly otherwise. Perhaps someone with more context on
Impala can chime in here.

2. Right, e.g. if leader A gets network partitioned from the other nodes
{B, C, D} such that the other nodes are able to elect a new leader, and at
the same time, a client uses its cached leader A and scans A in LEADER_ONLY
mode, then the results will be stale in comparison to any progress made by
{B, C, D}. There may be more scenarios that might demonstrate what you're
asking about, but I think this one fits.

tony12 · ‎12-14-2017

@awong，Thanks for your quick reply.

Cloudera Community

Support Questions

[KUDU] Does impala scan (READ_LATEST mode) inconsistency only arise during leader change?