Created on 06-16-2021 08:49 PM - edited 09-16-2022 07:42 AM
Hello,
I would like some guidance/ information on data distribution in Kudu T-Servers.
We have Kudu cluster of 3 Masters and 9 T-Servers (each t-server has storage of 1TB). We are noticing that space in some t-server is getting consumed rapidly whereas in other its not that much being consumed. Would like to know why this is happening and is there any way this can be overcome, so that data can be distributed evenly across of 9 t-servers.
Kudu 1.7.0-cdh5.16.2/ CM 5.16.2
Appreciate any assistance in this regard.
Thanks
Wert
Created 06-21-2021 01:29 AM
Hi @wert_1311
That's right, balancer just balances the tablet across the kudu cluster. If one host is consuming more space, it could be that the size of tablets is huge.
Thats right, Kudu cant rebalance like HDFS based on dfs usage.
one of the workaround you can try:-
- Stop that specific kudu TS role
- Run ksck until it comes healthy.
- once ksck is healthy, rebuild that particular Kudu TS (rebuilding = wiping all data and wal dir)
https://kudu.apache.org/docs/administration.html#rebuilding_kudu
- start that specific TS
- Run rebalance again
That should help. Let me know how did that go.
Cheers,
~ If that answers your question - Please give the thumbs up & mark the post as accept as solution.
Created 06-17-2021 12:39 PM
Hi @wert_1311 ,
Check for Tablet distribution across tablet servers. For some reason if one tablet server goes down/unavailable, the data will be replicated to other tablet servers.
You get can get number of tablets per tablet server using this command :-
sudo -u kudu kudu table list <csv of master addresses> -list_tablets | grep "^ " | cut -d' ' -f6,7 | sort | uniq -c
If you find the tablet distribution is uneven. You can go ahead with kudu rebalance tool to balance your cluster.
Let me know how did that go.
If that answers your question, Please mark this post as "accept as solution"
Regards,
Created 06-17-2021 08:43 PM
Hi @kingpin
I did execute the script, ran rebalance report & did a rebalance too however the result I was looking for was not archived (space is still over consumed in 1 TS). I think rebalance just distributes tablets evenly to all TS what I am looking to achieve is like HDFS rebalancer and I don’t think it is there in Kudu, correct me if I am wrong.
Thanks
Wert
Created 06-21-2021 01:29 AM
Hi @wert_1311
That's right, balancer just balances the tablet across the kudu cluster. If one host is consuming more space, it could be that the size of tablets is huge.
Thats right, Kudu cant rebalance like HDFS based on dfs usage.
one of the workaround you can try:-
- Stop that specific kudu TS role
- Run ksck until it comes healthy.
- once ksck is healthy, rebuild that particular Kudu TS (rebuilding = wiping all data and wal dir)
https://kudu.apache.org/docs/administration.html#rebuilding_kudu
- start that specific TS
- Run rebalance again
That should help. Let me know how did that go.
Cheers,
~ If that answers your question - Please give the thumbs up & mark the post as accept as solution.
Created 06-22-2021 10:29 PM
@wert_1311, has @kingpin's reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,