Created on 05-03-2018 01:02 AM - edited 09-16-2022 06:10 AM
Hi,
The issue (see log below) causes to be unable to launch impala query on Kudu tables through coordinator (but launch through executors work fine).
Appeared after Kudu (masters and tservers) was restarted.
CDH 5.14.2.
LDAP authentication is enabled for impala coordinator nodes only.
Let me know if any additional information is needed.
Any ideas how to avoid the issue?
Message in impalad log at impala coordinator daemon:
Tuple(id=0 size=49 slots=[Slot(id=0 type=STRING col_path=[] offset=0 null=(offset=48 mask=1) slot_idx=0 field_idx=-1), Slot(id=1 type=STRING col_path=[] offset=16 null=(offset=48 mask=2) slot_idx=1 field_idx=-1), Slot(id=2 type=STRING col_path=[] offset=32 null=(offset=48 mask=4) slot_idx=2 field_idx=-1)] tuple_path=[]) I0503 09:29:01.486366 41053 coordinator.cc:370] started execution on 1 backends for query 964f6d68c729a5cd:2ad2ba7200000000 I0503 09:29:01.486568 42404 query-state.cc:377] Executing instance. instance_id=964f6d68c729a5cd:2ad2ba7200000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0503 09:29:01.487128 42403 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=964f6d68c729a5cd:2ad2ba7200000000 refcnt=3 I0503 09:29:01.487524 41053 impala-hs2-server.cc:492] ExecuteStatement(): return_val=TExecuteStatementResp { 01: status (struct) = TStatus { 01: statusCode (i32) = 0, }, 02: operationHandle (struct) = TOperationHandle { 01: operationId (struct) = THandleIdentifier { 01: guid (string) = "<some value>", 02: secret (string) = "<some value>", }, 02: operationType (i32) = 0, 03: hasResultSet (bool) = false, }, } I0503 09:29:01.487515 42406 coordinator.cc:789] Coordinator waiting for backends to finish, 1 remaining I0503 09:29:01.491019 42404 client-internal.cc:281] Determining the new leader Master and retrying... I0503 09:29:01.507668 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired I0503 09:29:01.528237 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired ...
Created 05-07-2018 04:42 PM
Hi Andreyeff,
'FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired' error indicates this is authn token expiration issue. However, in CDH5.14.2, Kudu client should be able to automatically re-acquires authn token when needed. Do you know which Kudu client version you are using (CDH5.14.2)? When you launch the impala query, does the coordinator have primary credentials (i.e. Kerberos)?
Best,
Hao
Created on 05-08-2018 12:17 AM - edited 05-08-2018 12:19 AM
Answering your questions:
1) kudu -version
kudu 1.6.0-cdh5.14.2
However, not sure if Kudu client is used in communication between Impala and Kudu.
2) Impala coordinators use LDAP authentication, impala executors - don't.
Kerberos is not used.
Moreover, the new session is initiated and the same issue appears (this happens till Impala is restarted).
The sequence was:
Normally the cluster works, Impala-users connect and execute queries through coordinator.
There was Kudu leader change from one master to another (possibly also masterA->masterC->masterA).
The new Impala-user runs a query and receives this issue.
Impala restart has removed the effect and cluster returned to usual operating mode.
Created 05-08-2018 06:07 PM
Is it possible that Impala was not restarted after a previous upgrade?
Please report the issue if this happens again on CDH 5.14+
Thanks,
Mike
Created 05-09-2018 12:50 AM
The cluster was stopped during upgrade (from 5.13.0), so impala (as well as other services) is expected to be started with new configs since then.
From cluster inspect the only thing remained running from 5.13 is supervisord process.
I'll share more details if it appears again.
Created 05-09-2018 06:18 PM
Please do. Thanks in advance for that.
Created on 08-01-2018 06:24 AM - edited 08-01-2018 06:26 AM
The issue appeared again:
this time it was at nodes, which are both "coordinator only" and "coordinator+executor".
Coordinator+executor didn't have LDAP authentication.
Recent changes before the issue was noticed: there was added one new impala-coordinator-only instance.
One more observation:
Kudu-master log file:
negotiation.cc:320] Unauthorized connection attempt: Server connection negotiation failed: server connection from <impala-coord-executor-IP-address>:48648: authentication token expired ...
Impala-coord+executor log file:
I0801 13:46:10.203364 39650 coordinator.cc:370] started execution on 2 backends for query 21448fac89ef4281:3bffcd4800000000 I0801 13:46:10.238082 39926 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=21448fac89ef4281:3bffcd4800000000 refcnt=4 I0801 13:46:10.310528 39928 client-internal.cc:281] Determining the new leader Master and retrying... I0801 13:46:10.347209 39928 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <kudu-leader-IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired I0801 13:46:10.374269 39928 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <kudu-leader-IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
Restart of a single impala coordinator+executor instance (where the issue happened) didn't help.
Restart of all impala daemon instances resolved the issue.
Created 08-01-2018 11:55 AM
@Andreyeff please remind us what version you are running at this time? Do you see anything related in the Catalog Daemon log?
Created 08-01-2018 12:26 PM
Hi,
Thank you for reporting an issue.
A couple of questions to clarify before starting troubleshooting the issue:
Created on 08-03-2018 02:59 AM - edited 08-03-2018 08:11 AM
Thanks for fast replies and noting about version - forgot to mention it:
kudu 1.6.0-cdh5.14.4
Impala Shell v2.11.0-cdh5.14.4
1. Here is impala-shell (I have saved a piece of failed queries with a timestamp, so I'll try to provide details basing on it):
[<coord+executor_dns>>:21000] > select * from enabled_vars where pid="3020"; Query: select * from enabled_vars where pid="3020" Query submitted at: 2018-08-01 13:47:35 (Coordinator: http://<coord+executor_dns>>:25000) Query progress can be monitored at: http://<coord+executor_dns>>:25000/query_plan?query_id=2a43f7388ed44c8d:92ff202f00000000 WARNINGS: Unable to open Kudu table: Timed out: GetTableSchema timed out after deadline expired [<coord+executor_dns>>:21000] > invalidate metadata; Query: invalidate metadata Query submitted at: 2018-08-01 13:48:45 (Coordinator: http://<coord+executor_dns>>:25000) Query progress can be monitored at: http://<coord+executor_dns>>:25000/query_plan?query_id=e34f104534f46bd9:a22d22e200000000 Fetched 0 row(s) in 42.06s
2. Nothing much in catalog daemon log at that time:
I0801 13:46:19.166558 58677 catalog-server.cc:241] Catalog Version: 82226 Last Catalog Version: 82226 I0801 13:48:42.336992 47009 HdfsTable.java:1197] Incrementally loading table metadata for: <some_table>
1. 3 kudu masters
2. yes, all 3
show create table enabled_vars; Query: show create table enabled_vars +------------------------------------------------------------------------------------------------------------------------------------------------+ | result | +------------------------------------------------------------------------------------------------------------------------------------------------+ | CREATE TABLE <desired_database>.enabled_vars ( | | pid STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, | | var_code STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, | | PRIMARY KEY (pid, var_code) | | ) | | PARTITION BY HASH (pid, var_code) PARTITIONS 2 | | STORED AS KUDU | | TBLPROPERTIES ('STATS_GENERATED_VIA_STATS_TASK'='true', 'kudu.master_addresses'='<master1>,<master2>,<master3>') | +------------------------------------------------------------------------------------------------------------------------------------------------+ Fetched 1 row(s) in 0.07s
3. Yes, it is met at the failing impala-daemon, but only more than a day before issue (see timestamp):
I0731 09:13:59.293942 4640 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.293975 4637 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294001 4625 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294025 4632 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294055 4622 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294075 4643 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294252 4628 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294342 4648 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.294668 4633 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.296530 4652 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.304276 4661 client-internal.cc:233] Reconnecting to the cluster for a new authn token I0731 09:13:59.394750 29050 impala-internal-service.cc:44] ExecQueryFInstances(): query_id=464265b6715c2634:cdc2ab0500000000 I0731 09:13:59.395293 29050 query-exec-mgr.cc:46] StartQueryFInstances() query_id=464265b6715c2634:cdc2ab0500000000 coord=<another_coordinator_node>:22000 I0731 09:13:59.395375 29050 query-state.cc:173] Buffer pool limit for 464265b6715c2634:cdc2ab0500000000: 68719476736 I0731 09:13:59.396725 29050 initial-reservations.cc:60] Successfully claimed initial reservations (0) for query 464265b6715c2634:cdc2ab050 0000000
Do not see any errors closely after it.
4. No, can't find at the failing impala-daemon
Created 08-04-2018 09:58 AM
Thank you for the update.
As I understand the logs doesn't contain messages on failure to re-acquire authn token, and authn tokens were successfully re-acquired some time before the error referenced in the first post.
Two more questions:
I0503 09:29:01.528237 42404 client-internal.cc:283] Unable to determine the new leader Master: Not authorized: Client connection negotiation failed: client connection to <Kudu_leader_IP>:7051: FATAL_INVALID_AUTHENTICATION_TOKEN: Not authorized: authentication token expired
I might be wrong, but those messages about expired authn tokens might be a red herring.
Thank you!
Created 08-06-2018 05:49 AM
Unfortunately, the logrotation policies have just removed the logs for the case in the first post.
Are there any ideas/suggestions based on difference in table types - external and not? Both were based on Kudu, but external continued to work properly.
Created 09-17-2018 11:14 AM
Hi,
I have an update on this issue.
I've got a chance to look at this issue closer having on hand Impalad's log. As it turned out, there is a bug in the Kudu C++ client and I think your case was a manifestation of that bug as well:
https://issues.apache.org/jira/browse/KUDU-2580
I posted a patch to fix the issue and I hope the fix will be reviewed and merged soon.
Best regards,
Alexey
Created 09-18-2018 03:01 AM
Great news, thanks!
Since it is Kudu client, does it mean it will be available when a new version of Impala is released with fixed Kudu client?
Created 09-19-2018 01:18 PM
Right, to make the fix into Impala it's necessary to relink impalad with patched Kudu client.
impalad is linked against the kudu_client dynamically, so in theory it might be possible just to replace the libkudu_client.so.0 library with the patched version. However, that's really messy and I would not recommend that. If you use CDH anyway, the best option is to wait for next release of CDH. I don't know what version that will be, though.
If you want a workaround, set the --authn_token_validity_seconds flag to be months or even one year long (i.e. --authn_token_validity_seconds=31536000) and restart Kudu masters. You will need to enable experimental flags as well (i.e. add --unlock_experimental_flags).