Created on 05-23-2021 08:08 PM - edited 05-23-2021 08:52 PM
KUDU version: 1.9.0+cdh6.2.0
5、block_cache_capacity_mb : 2G
The cluster has 4 tablet-server，and three yarn nodemanager are in the same node with tablet-server.
When I running a MR job in Yarn，just hive sql， and kudu tablet server will quit random, with fowlling logs:
T 5e5cdb8cf25d4c93aeaf013781419109 P ac586d8c49f84c4c82770ae079256893 -> Peer ac44fc76284d4b959eca897309e465b0 (ch4.360kad.com:7050): Couldn't send request to peer ac44fc76284d4b959eca897309e465b0. Status: Remote error: Service unavailable: UpdateConsensus request on kudu.consensus.ConsensusService from 10.0.57.16:26274 dropped due to backpressure. The service queue is full; it has 50 items.. This is attempt 1: this message will repeat every 5th retry.
I dont know how sole this problem
Those warning messages about dropped RPC requests due to backpressure is a sign that particular tablet server is likely overloaded. Consider the following remedies:
Thanks youe reply. Unfortunately, an upgrade is not available at this time in my company.
I have rebalance my tablet server and modified the config 【maintenance_manager_num_threads】to 8 , 【block_cache_capacity_mb】 to 512MB, 【memory_limit_hard_bytes】to 60G。
AND then I try to run MR job on yarn, when get map counts with 96, and memory with 194G on yarn, kudu server is stable. So I continue to run a few job on yarn to to observe Kudu, and kudu server is still stable. So I think it is ok and set up scheduling tasks.
But when today, a job run 179 maps, the kudu server is random quit...
this is the memory detail one of tablet server