Support Questions

yagoaparecidoti · ‎03-28-2022

We would like to understand two behaviors that Kudu is showing.

1. When restarting the Kudu Tablet, it takes between 5 and 10min for this Kudu Tablet to communicate with the Kudu Master again, why is this happening?

2. There are 5 Kudu Tablets and two of those Kudu Tablets is using more than 90% of the memory that was set in "memory_limit_hard_bytes", why is this happening?

jromero · ‎03-30-2022

Hi @yagoaparecidoti

Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers. If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked.

> Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas? Can you compare those with a non affected TS?

View solution in original post

jromero · ‎03-28-2022

With regards to your questions:

1. To further understand this behavior, it would be needed to check the tablet server logs during startup. Likely the block processing time is taking a good part of those 10 mins.

2. It could be possible that those 2 servers are overloaded, that can happen if the kudu cluster is not balanced.

To further understand, the output of the ksck and:

kudu table list ${MASTER_ADDRS} -list_tablets | grep "^ " | cut -d' ' -f6,7 | sort

Would throw a good output to gain a better understanding of your cluster situation.

If you can attach those here, would be great for further analysis.

yagoaparecidoti · ‎03-29-2022

hi @jromero

kudu is balanced correctly.

there are 5 kudu tablet and a total of 1173 replicas and each kudu tablet has 234 tablets, this shows that it is balanced.

even being balanced, it presents these two mentioned problems.

jromero · ‎03-29-2022

Hellio @yagoaparecidoti

Being balanced is a good thing. How about hotspotting?

Can you check in the kudu web page > Tablet servers > Click on those affected > check the metrics and the RPC pages on those two.

> How does it look?

> How many RPC calls compared to a healthy tablet server?

yagoaparecidoti · ‎03-29-2022

Hi @jromero

the "metrics[1]" and "rpcs[2]" web pages are too long:

what should we actually look at?
what parameter do we need to look at?

[1] - http://ip_host:8050/metrics
[2] - http://ip_host:8050/rpcz

jromero · ‎03-29-2022

Inbound/Outbound connections on the rpc page.

rpc_* metrics on the metrics page. Sorry I don't recall the exact metric names.

yagoaparecidoti · ‎03-29-2022

hi @jromero

on the page "http://ip_host:8050/rpcz":

in "inbound_connections" shows the addresses of other tablet servers with "open" state.

in "outbound_connections" it shows the addresses of other tablet servers and master with "open" state.

this above also happens with other tablet servers.

I didn't find anything that could be related to the mentioned problems.

jromero · ‎03-30-2022

Hi @yagoaparecidoti

Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers. If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked.

> Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas? Can you compare those with a non affected TS?

yagoaparecidoti · ‎04-04-2022

hi @jromero

thanks for the feedback.

unfortunately we cannot make the TS logs available as they contain sensitive information.

we will try to open a ticket with support.

checking the TS replica charts "Total Tablet Size On Disk Across Kudu Replicas", they all contain the same size.

VidyaSargur · ‎04-03-2022

@yagoaparecidoti, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information @jromero has requested?

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Support Questions

Kudu Tablet taking too long to sync with Kudu Master and Kudu Tablet using 90% memory