- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Kudu Tablet Server High CPU
- Labels:
-
Apache Kudu
Created on 03-22-2019 10:49 PM - edited 09-16-2022 07:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a cluster with 3 tservers. When running a workload that heavily reads from the cluster, 1 of the 3 tservers is reaching nearly 100% CPU utilization while the other two are less than 10%. The tablets are equally balanced amongst the 3.
I am thinking that by chance, all my data used in this particular workload happens to reside on the 1 tserver.
thoughts? How might I diagnose this further?
Created 03-24-2019 02:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're right that if you're only using one host to initiate reads, the reads will go to the local tserver rather than round-robin across the cluster. The master doesn't directly tell where clients to scan; it just provides them with enough information to make that decision based on their replica selection policy. There's also no way to do round robin (or randomized) replica selection.
Created 03-23-2019 04:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Before we get into metrics and other lower-level troubleshooting techniques, let's start with how you're reading.
What are you using to read? If it's the raw Kudu API, are you using the LEADER_ONLY replica selection policy? If so, and if your three node cluster is heavily skewed so that the majority of leader replicas are on one node, it's possible for that node to be servicing the majority of your scans.
Created 03-23-2019 11:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How can I find the balance of the leader replica’s amongst the nodes?
Created 03-24-2019 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you don't need the stronger consistency guarantees of LEADER_ONLY, change your replica selection policy to CLOSEST_REPLICA, and that should ensure a more even distribution of reads provided your scan requests are evenly originated amongst the cluster's nodes.
Created on 03-24-2019 01:48 PM - edited 03-24-2019 02:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Will the rebalancer distribute the leaders evenly amongst the cluster. It is not clear from the docs, seems it only balances the replica's which should result in leaders also being balanced as well?
Let’s say “I only have 1 host reading from the cluster and I select closest_replica. Won’t I end up in the same situation? How does the master distribute load? IP address hash? Can I change this to RR or this something controlled from the client side?
For others reading this post, I was able to identify the leader tablet distribution without using the rebalance tool. I am on an older Kudu that does not provide the tool.
I was able to copy and paste the live tablet info from the UI of the t-servers into excel and found that 95% of the of leaders are on the first t-server.
Thanks
Created 03-24-2019 02:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're right that if you're only using one host to initiate reads, the reads will go to the local tserver rather than round-robin across the cluster. The master doesn't directly tell where clients to scan; it just provides them with enough information to make that decision based on their replica selection policy. There's also no way to do round robin (or randomized) replica selection.
