Recently i met a KUDU issue which was randomly occurred. Below is the details.
CDH Version: 5.14.2
KUDU Version: 1.6.0-cdh5.14.2
21/11/28 06:56:03 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.kudu.client.NoLeaderFoundException: Master config (10.186.93.6:7051,10.186.93.24:7051,10.186.93.8:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: connection disconnected,org.apache.kudu.client.RecoverableException: connection disconnected
org.apache.kudu.client.NoLeaderFoundException: Master config (10.186.93.6:7051,10.186.93.24:7051,10.186.93.8:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: connection disconnected,org.apache.kudu.client.RecoverableException: connection disconnected
Caused by: org.apache.kudu.client.RecoverableException: connection disconnected
... 55 more
this issue happened when i submitted a SPARK style task to YARN. Most time it is fine, but sometimes the problem come. Maybe several times per day.
Everytime I have to re-submit my task to YARN again, luckly the second time try always succeed.
Is it a BUG or something else? Is there any way to fix or workaround?
Kudu has a hard requirement on having an up-to-date NTP. Kudu masters and tablet servers will crash when out of sync.
you can check whether your ntp service is up to date
thanks your reply bro, but the ntp sync looks fine because you know CDH cluster has the check for time sync.
Is there any configuration suggestion for my kudu cluster?
I set 3 masters and 76 kudu tablet servers. The resource for each node is about 48cores and 384G mem and 5T disk space.
We have a high read/write working scenario, is it a reason to cause this?
I have no idea about the issue assuming the ntp server is fine.
I searched from some blogs, and some guys recommended to use a host name instead of ip.
you can have a try.
I am curious about the issue, and will pay attention to it.
looking forword to see "the solutions".
Good luck !
transfer the way of connection to hostname?
it sounds like choosing a long way instead of the short one.
But maybe it works. OK, i'll have a try.
Wait for my good news.
sad news. It was not working. The error message turns to "org.apache.kudu.client.NoLeaderFoundException: Master config (hostnamex:7051,hostnamex:7051,hostnamex:7051)"
Are there any reasons to cause this issue? Such as high pressure on KUDU cluster?
You should always use the fully qualified hostnames, instead of IP addresses.
Although it's longer, it can prevent problems. If your cluster is using TLS, for example, the full hostnames are required.
I imagine these logs that you shared are client logs, right?
Can you check the Tablet Server logs and see if there are errors in them? Those would help understand the issue.
thanks for response.
the logs are really from client. I checked the KUDU logs for ERROR, but nothing was found.
now i have replaced the IP to hostname.
hope it works.