Support Questions

Find answers, ask questions, and share your expertise

Kudu Tablet taking too long to sync with Kudu Master and Kudu Tablet using 90% memory

avatar
Expert Contributor

We would like to understand two behaviors that Kudu is showing.

 

1. When restarting the Kudu Tablet, it takes between 5 and 10min for this Kudu Tablet to communicate with the Kudu Master again, why is this happening?

 

2. There are 5 Kudu Tablets and two of those Kudu Tablets is using more than 90% of the memory that was set in "memory_limit_hard_bytes", why is this happening?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi @yagoaparecidoti 

 

Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers.  If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked.

 

> Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas?  Can you compare those with a non affected TS?

 

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

With regards to your questions:

1. To further understand this behavior, it would be needed to check the tablet server logs during startup.  Likely the block processing time is taking a good part of those 10 mins.

 

2. It could be possible that those 2 servers are overloaded, that can happen if the kudu cluster is not balanced.

To further understand, the output of the ksck and:

kudu table list ${MASTER_ADDRS} -list_tablets | grep "^ " | cut -d' ' -f6,7 | sort  

 Would throw a good output to gain a better understanding of your cluster situation.

 

If you can attach those here, would be great for further analysis.

avatar
Expert Contributor

hi @jromero 

 

kudu is balanced correctly.

 

there are 5 kudu tablet and a total of 1173 replicas and each kudu tablet has 234 tablets, this shows that it is balanced.

 

even being balanced, it presents these two mentioned problems.

avatar
Expert Contributor

Hellio @yagoaparecidoti 

 

Being balanced is a good thing.  How about hotspotting?

Can you check in the kudu web page > Tablet servers > Click on those affected > check the metrics and the RPC pages on those two.

 

> How does it look?

> How many RPC calls compared to a healthy tablet server?

 

avatar
Expert Contributor

Hi @jromero 

 

the "metrics[1]" and "rpcs[2]" web pages are too long:

 

what should we actually look at?
what parameter do we need to look at?

 

[1] - http://ip_host:8050/metrics
[2] - http://ip_host:8050/rpcz

avatar
Expert Contributor

Inbound/Outbound connections on the rpc page.

rpc_* metrics on the metrics page.  Sorry I don't recall the exact metric names.

avatar
Expert Contributor

hi @jromero 

 

on the page "http://ip_host:8050/rpcz":

 

in "inbound_connections" shows the addresses of other tablet servers with "open" state.

 

in "outbound_connections" it shows the addresses of other tablet servers and master with "open" state.

 

this above also happens with other tablet servers.

 

I didn't find anything that could be related to the mentioned problems.

avatar
Expert Contributor

Hi @yagoaparecidoti 

 

Thanks, in that case further investigation will be needed, we would need to check what is happening in those 2 tablet servers.  If you are able to share the logs from those TS that would be great, if not it will be quite hard to tell and your best bet would be to open a support case to have it checked.

 

> Are you able to check the charts on cloudera manager > Kudu > instances > tablet server > Chart library > Replicas?  Can you compare those with a non affected TS?

 

avatar
Expert Contributor

hi @jromero 

 

thanks for the feedback.

 

unfortunately we cannot make the TS logs available as they contain sensitive information.

 

we will try to open a ticket with support.

 

checking the TS replica charts "Total Tablet Size On Disk Across Kudu Replicas", they all contain the same size.

avatar
Community Manager

@yagoaparecidoti, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  If you are still experiencing the issue, can you provide the information @jromero  has requested?



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: