Support Questions

Find answers, ask questions, and share your expertise

ImpalaRuntimeException: Unable to initialize the Kudu scan node

avatar
Expert Contributor

Hi Cloudera gurús,

This is my CDP.cdp_env.png

3 Master Nodes+3 Worker Nodes

HA enabled and testing it.

Here is the issue: when I shut down Master 2 some queries are randomly failing showing this:

# impala-shell -i haproxy-server.com -q "use dbschema; select * from table_foo limit 10;"
Starting Impala Shell without Kerberos authentication
Warning: live_progress only applies to interactive shell sessions, and is being skipped for now.
Opened TCP connection to haproxy-server.com:21000
Connected to haproxy-server.com:21000
Server version: impalad version 3.4.0-SNAPSHOT RELEASE (build 0cadcf7ac76ecec87d9786048db3672c37d41c6f)
Query: use dbschema
Query: select * from table_foo limit 10
Query submitted at: 2022-03-23 11:28:37 (Coordinator: http://worker1:25000)
ERROR: ImpalaRuntimeException: Unable to initialize the Kudu scan node
CAUSED BY: AnalysisException: Unable to open the Kudu table: dbschema.table_foo
CAUSED BY: NonRecoverableException: cannot complete before timeout: KuduRpc(method=GetTableSchema, tablet=Kudu Master, attempt=1, TimeoutTracker(timeout=180000, elapsed=180004), Trace Summary(0 ms): Sent(1), Received(0), Delayed(0), MasterRefresh(0), AuthRefresh(0), Truncated: false
Sent: (master-192.168.1.10:7051, [ GetTableSchema, 1 ]))

Could not execute command: select * from table_foo limit 10

The thing is that all leaders are correctly re-balanced to other nodes and something is working, because most queries are working.

Does someone have any clue? I was thinking about Hive server but not sure how to trace it.

Note: as CM is in Master2, this is unavailable (this is not affecting, some different tests have been done having CM out of service and queries were working fine)

Note2: does it affects that the kudu Master were in Master2?

Many thanks in advance for your help.

 

Best Regards

4 REPLIES 4

avatar
Master Collaborator

Hello @Juanes ,

 

Could you please check the 

ksck report 

ksck report from kudu, Please if you have any unhealthy tables also verify the replicas as well.

Please refer doc[1]

 

doc[1]: 

https://kudu.apache.org/docs/administration.html#tablet_majority_down_recovery

Thanks, 

avatar
Expert Contributor

Hello ,

the ksck is showing that tables are OK (Recovering | Under-replicated | Unavailable are all = 0)

W0324 12:15:41.325619 18080 negotiation.cc:313] Failed RPC negotiation. Trace:

0324 12:15:40.627405 (+     0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051
0324 12:15:40.627616 (+   211us) negotiation.cc:98] Waiting for socket to connect
0324 12:15:41.325243 (+697627us) negotiation.cc:304] Negotiation complete: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113)
Metrics: {"client-negotiator.queue_time_us":187,"thread_start_us":157,"threads_started":1}
W0324 12:15:44.329090 18080 negotiation.cc:313] Failed RPC negotiation. Trace:
0324 12:15:41.325954 (+     0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051
0324 12:15:41.326027 (+    73us) negotiation.cc:98] Waiting for socket to connect
0324 12:15:44.329031 (+3003004us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to master2:7051: Timeout exceeded waiting to connect
Metrics: {"client-negotiator.queue_time_us":38}
W0324 12:15:44.331089 18080 negotiation.cc:313] Failed RPC negotiation. Trace:
0324 12:15:44.329518 (+     0us) reactor.cc:609] Submitting negotiation task for client connection to master2:7051
0324 12:15:44.329580 (+    62us) negotiation.cc:98] Waiting for socket to connect
0324 12:15:44.331065 (+  1485us) negotiation.cc:304] Negotiation complete: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113)
Metrics: {"client-negotiator.queue_time_us":36}
Master Summary
                  UUID                  |          Address           |   Status
----------------------------------------+----------------------------+-------------
 8a168b68a6dd487c952419672ba32088       | master3.server.com | HEALTHY
 daa9129e78244be2aaa7e5e649cc1dc8       | master1.server.com | HEALTHY
 <unknown> (master2.server.com) | master2.server.com | UNAVAILABLE
Error from master2.server.com: Network error: Client connection negotiation failed: client connection to master2:7051: connect: No route to host (error 113) (UNAVAILABLE)
All reported replicas are:
  A = daa9129e78244be2aaa7e5e649cc1dc8
  B = <unknown> (master2.server.com)
  C = 8a168b68a6dd487c952419672ba32088
  D = 1f02c618009c44d381c55841dcb5a498
The consensus matrix is:
 Config source |        Replicas        | Current term | Config index | Committed?
---------------+------------------------+--------------+--------------+------------
 A             | A*      C   D          | 88           | -1           | Yes
 B             | [config not available] |              |              |
 C             | A*      C   D          | 88           | -1           | Yes
Flags of checked categories for Master:
        Flag         |                            Value                            |                         Master
---------------------+-------------------------------------------------------------+--------------------------------------------------------
 builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | master1.server.com, master3.server.com
 time_source         | system                                                      | master1.server.com, master3.server.com
Tablet Server Summary
               UUID               |             Address             | Status  | Location | Tablet Leaders | Active Scanners
----------------------------------+---------------------------------+---------+----------+----------------+-----------------
 0dcf7da9a99c43dd99f956a95abb773d | worker3.server.com:7050 | HEALTHY | /default |      85        |       0
 345166476e6440b4a957d76dd1de947e | worker2.server.com:7050 | HEALTHY | /default |     314        |       0
 9e30289cb8d244e0a7fb822897fc7c29 | worker1.server.com:7050 | HEALTHY | /default |    1051        |       0
Tablet Server Location Summary
 Location |  Count
----------+---------
 /default |       3

Flags of checked categories for Tablet Server:
        Flag         |                            Value                            |      Tablet Server
---------------------+-------------------------------------------------------------+-------------------------
 builtin_ntp_servers | 0.pool.ntp.org,1.pool.ntp.org,2.pool.ntp.org,3.pool.ntp.org | all 3 server(s) checked
 time_source         | system                                                      | all 3 server(s) checked

Version Summary
      Version       |                                                               Servers
--------------------+--------------------------------------------------------------------------------------------------------------------------------------
 1.13.0.7.1.6.0-297 | master@master3.server.com, master@master1.server.com, tserver@worker3.server.com:7050, and 2 other server(s)

Tablet Summary
Summary by table
                                  Name                                   | RF | Status  | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable
-------------------------------------------------------------------------+----+---------+---------------+---------+------------+------------------+-------------

Tablet Replica Count Summary
Statistic | Replica Count
----------------+---------------
Minimum | 1450
First Quartile | 1450
Median | 1450
Third Quartile | 1450
Maximum | 1450

Total Count Summary
| Total Count
----------------+-------------
Masters | 3
Tablet Servers | 3
Tables | 109
Tablets | 1450
Replicas | 4350

==================
Warnings:
==================
master unusual flags check error: 1 of 3 masters were not available to retrieve unusual flags
master diverged flags check error: 1 of 3 masters were not available to retrieve time_source category flags

==================
Errors:
==================
Network error: error fetching info from masters: failed to gather info from all masters: 1 of 3 had errors
Corruption: master consensus error: there are master consensus conflicts

 

That I have no clear is why I'm having a consensus error if I have 2of3 Master UP and all 3 Tablet servers UP

 

 
 Many thanks for your help.

avatar
Expert Contributor

Hello,

does anyone knows if exists any table reference with the errors?

Just wanted to know what it means : Unable to initialize the Kudu scan node

no relevant traces found in the following logs:

Impala Daemon

Impala Catalog Server

Impala State Store

Kudu Master Leader

Kudu tablet

Hive Metastore

Hive Server2

 

I'm getting out of resources 😞

 

avatar
Expert Contributor

Hello,

it seems the main error is related to Impala, Kudu is balancing and responding well during the tests, the issue is that Impala breaks the connection to whoever that inform where the new Kudu Master LEADER is. I'm suspicious about the Cloudera Management services that are already down.

Will update the solution whenever I have it.