Support Questions
Find answers, ask questions, and share your expertise

Authentication token expired at impala coordinators after Kudu restart

Explorer

Hi,

 

The issue (see log below) causes to be unable to launch impala query on Kudu tables through coordinator (but launch through executors work fine).

Appeared after Kudu (masters and tservers) was restarted.

CDH 5.14.2.

LDAP authentication is enabled for impala coordinator nodes only.

 

Let me know if any additional information is needed.

Any ideas how to avoid the issue?

 

Message in impalad log at impala coordinator daemon:

Tuple(id=0 size=49 slots=[Slot(id=0 type=STRING col_path=[] offset=0 null=(offset=48 mask=1) slot_idx=0 field_idx=-1), Slot(id=1 type=STRING col_path=[] offset=16 null=(offset=48 mask=2) slot_idx=1 field_idx=-1), Slot(id=2 type=STRING col_path=[] offset=32 null=(offset=48 mask=4) slot_idx=2 field_idx=-1)] tuple_path=[])
I0503 09:29:01.486366 41053 coordinator.cc:370] started execution on 1 backends for query 964f6d68c729a5cd:2ad2ba7200000000
I0503 09:29:01.486568 42404 query-state.cc:377] Executing instance. instance_id=964f6d68c729a5cd:2ad2ba7200000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0503 09:29:01.487128 42403 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=964f6d68c729a5cd:2ad2ba7200000000 refcnt=3
I0503 09:29:01.487524 41053 impala-hs2-server.cc:492] ExecuteStatement(): return_val=TExecuteStatementResp {
01: status (struct) = TStatus {
01: statusCode (i32) = 0,
},
02: operationHandle (struct) = TOperationHandle {
01: operationId (struct) = THandleIdentifier {
01: guid (string) = "<some value>",
02: secret (string) = "<some value>",
},
02: operationType (i32) = 0,
03: hasResultSet (bool) = false,
},
}
I0503 09:29:01.487515 42406 coordinator.cc:789] Coordinator waiting for backends to finish, 1 remaining
I0503 09:29:01.491019 42404 client-internal.cc:281] Determining the new leader Master and retrying...
I0503 09:29:01.507668 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
I0503 09:29:01.528237 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
...

 

 

 

14 REPLIES 14

Contributor

Hi Andreyeff,

 

'FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired' error indicates this is authn token expiration issue. However, in CDH5.14.2, Kudu client should be able to automatically re-acquires authn token when needed.  Do you know which Kudu client version you are using (CDH5.14.2)?  When you launch the impala query, does the coordinator have primary credentials (i.e. Kerberos)?

 

Best,

Hao

Explorer

Answering your questions:

1) kudu -version
kudu 1.6.0-cdh5.14.2

However, not sure if Kudu client is used in communication between Impala and Kudu.

2) Impala coordinators use LDAP authentication, impala executors - don't.

Kerberos is not used.

Moreover, the new session is initiated and the same issue appears (this happens till Impala is restarted).

 

The sequence was:

Normally the cluster works, Impala-users connect and execute queries through coordinator.

There was Kudu leader change from one master to another (possibly also masterA->masterC->masterA).

The new Impala-user runs a query and receives this issue.

Impala restart has removed the effect and cluster returned to usual operating mode.

Expert Contributor

Is it possible that Impala was not restarted after a previous upgrade?

 

Please report the issue if this happens again on CDH 5.14+

 

Thanks,

Mike

Explorer

The cluster was stopped during upgrade (from 5.13.0), so impala (as well as other services) is expected to be started with new configs since then.

 

From cluster inspect the only thing remained running from 5.13 is supervisord process.

 

I'll share more details if it appears again.

Expert Contributor

Please do. Thanks in advance for that.

Explorer

The issue appeared again:

this time it was at nodes, which are both "coordinator only" and "coordinator+executor".

Coordinator+executor didn't have LDAP authentication.

 

Recent changes before the issue was noticed: there was added one new impala-coordinator-only instance.

 

One more observation:

  • impala external tables (on top of Kudu) continue working fine
  • impala tables stored as Kudu were failing with the warning (in impala CLI):
    WARNINGS: Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired

 

 

Kudu-master log file:

 

negotiation.cc:320] Unauthorized connection attempt: Server connection negotiation failed: server connection from <impala-coord-executor-IP-address>:48648: authentication token expired
...

 

Impala-coord+executor log file:

I0801 13:46:10.203364 39650 coordinator.cc:370] started execution on 2 backends for query 21448fac89ef4281:3bffcd4800000000
I0801 13:46:10.238082 39926 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=21448fac89ef4281:3bffcd4800000000 refcnt=4
I0801 13:46:10.310528 39928 client-internal.cc:281] Determining the new leader Master and retrying...
I0801 13:46:10.347209 39928 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <kudu-leader-IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
I0801 13:46:10.374269 39928 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <kudu-leader-IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired

 

Restart of a single impala coordinator+executor instance (where the issue happened) didn't help.

Restart of all impala daemon instances resolved the issue.

Expert Contributor

@Andreyeff please remind us what version you are running at this time? Do you see anything related in the Catalog Daemon log?

Contributor

Hi,

 

Thank you for reporting an issue.

 

A couple of questions to clarify before starting troubleshooting the issue:

  1. How many Kudu masters do you have in your cluster?
  2. Do Kudu tables in Impala have all Kudu masters specified in their 'kudu.master_addresses' property?
  3. In the Impala's logs, do you see anything like 'Reconnecting to the cluster for a new authn token'?  If yes, what are the log lines right after that?
  4. In the Impala's logs, do you see anything like 'Unable to reconnect to the cluster for a new authn token'?  If yes, what are the error details that come along with that message?

 

 

Explorer

Thanks for fast replies and noting about version - forgot to mention it:

kudu 1.6.0-cdh5.14.4

Impala Shell v2.11.0-cdh5.14.4

 

mpercy,

1. Here is impala-shell (I have saved a piece of failed queries with a timestamp, so I'll try to provide details basing on it):

[<coord+executor_dns>>:21000] > select * from enabled_vars where pid="3020";
Query: select * from enabled_vars where pid="3020"
Query submitted at: 2018-08-01 13:47:35 (Coordinator: http://<coord+executor_dns>>:25000)
Query progress can be monitored at: http://<coord+executor_dns>>:25000/query_plan?query_id=2a43f7388ed44c8d:92ff202f00000000
WARNINGS: Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired


[<coord+executor_dns>>:21000] > invalidate metadata;
Query: invalidate metadata
Query submitted at: 2018-08-01 13:48:45 (Coordinator: http://<coord+executor_dns>>:25000)
Query progress can be monitored at: http://<coord+executor_dns>>:25000/query_plan?query_id=e34f104534f46bd9:a22d22e200000000
Fetched 0 row(s) in 42.06s

2. Nothing much in catalog daemon log at that time:

I0801 13:46:19.166558 58677 catalog-server.cc:241] Catalog Version: 82226 Last Catalog Version: 82226
I0801 13:48:42.336992 47009 HdfsTable.java:1197] Incrementally loading table metadata for: <some_table>

 

 

Alexey1c,

1. 3 kudu masters

2. yes, all 3

show create table enabled_vars;
Query: show create table enabled_vars
+------------------------------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                                         |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| CREATE TABLE <desired_database>.enabled_vars (                                                                                                        |
|   pid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,                                                                  |
|   var_code STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,                                                             |
|   PRIMARY KEY (pid, var_code)                                                                                                                  |
| )                                                                                                                                              |
| PARTITION BY HASH (pid, var_code) PARTITIONS 2                                                                                                 |
| STORED AS KUDU                                                                                                                                 |
| TBLPROPERTIES ('STATS_GENERATED_VIA_STATS_TASK'='true', 'kudu.master_addresses'='<master1>,<master2>,<master3>') |
+------------------------------------------------------------------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.07s

3. Yes, it is met at the failing impala-daemon, but only more than a day before issue (see timestamp):

I0731 09:13:59.293942  4640 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.293975  4637 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294001  4625 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294025  4632 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294055  4622 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294075  4643 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294252  4628 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294342  4648 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.294668  4633 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.296530  4652 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.304276  4661 client-internal.cc:233] Reconnecting to the cluster for a new authn token
I0731 09:13:59.394750 29050 impala-internal-service.cc:44] ExecQueryFInstances(): query_id=464265b6715c2634:cdc2ab0500000000
I0731 09:13:59.395293 29050 query-exec-mgr.cc:46] StartQueryFInstances() query_id=464265b6715c2634:cdc2ab0500000000 coord=<another_coordinator_node>:22000
I0731 09:13:59.395375 29050 query-state.cc:173] Buffer pool limit for 464265b6715c2634:cdc2ab0500000000: 68719476736
I0731 09:13:59.396725 29050 initial-reservations.cc:60] Successfully claimed initial reservations (0) for query 464265b6715c2634:cdc2ab050
0000000

Do not see any errors closely after it.

4. No, can't find at the failing impala-daemon

Contributor

Thank you for the update.

 

As I understand the logs doesn't contain messages on failure to re-acquire authn token, and authn tokens were successfully re-acquired some time before the error referenced in the first post.

 

Two more questions:

  1. How far the timestamp referenced in the very fist post was from the time when Impala was unable to launch a query?
    I0503 09:29:01.528237 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
  2. What was the error message output to the user (if any), when Impala was unable to launch the query?

 

I might be wrong, but those messages about expired authn tokens might be a red herring.

 

Thank you!

Explorer

Unfortunately, the logrotation policies have just removed the logs for the case in the first post.

 

Are there any ideas/suggestions based on difference in table types - external and not? Both were based on Kudu, but external continued to work properly.

 

Contributor

Hi,

 

I have an update on this issue.

 

I've got a chance to look at this issue closer having on hand Impalad's log.  As it turned out, there is a bug in the Kudu C++ client and I think your case was a manifestation of that bug as well: 

 

  https://issues.apache.org/jira/browse/KUDU-2580

 

I posted a patch to fix the issue and I hope the fix will be reviewed and merged soon.

 

 

Best regards,

 

Alexey

 

Explorer

Great news, thanks!

Since it is Kudu client, does it mean it will be available when a new version of Impala is released with fixed Kudu client?

Contributor

Right, to make the fix into Impala it's necessary to relink impalad with patched Kudu client.

 

impalad is linked against the kudu_client dynamically, so in theory it might be possible just to replace the libkudu_client.so.0 library with the patched version.  However, that's really messy and I would not recommend that.  If you use CDH anyway, the best option is to wait for next release of CDH.  I don't know what version that will be, though.

 

If you want a workaround, set the --authn_token_validity_seconds flag to be months or even one year long  (i.e. --authn_token_validity_seconds=31536000) and restart Kudu masters.  You will need to enable experimental flags as well (i.e. add --unlock_experimental_flags).

; ;