11-24-2016 04:26 AM - edited 11-24-2016 04:28 AM
We're seeing some possible issues with the number of open file descriptors and sizing of those files for one of our Kudu installations. This Kudu installation currently has 7 tablet servers available with a grand total of around 257.500 open file descriptors spread over those servers.
It currently holds only 3 tables, with a hash partition strategy on all three resulting in 75 buckets total (25 per table). Two tables are only 3 columns and one has 5.
Of those 257k open file descriptors about 128k are for .data files. The issue with these files is that most do not get much larger than a couple of K or at most MB. Of those data files about 111k are smaller than 1MB.
Can someone shed some light on why Kudu is utilizing such a large number of files and only occupying a couple of K within them. I can't link the sheer number of files with either the partition strategy or number of columns in the tables.
We are inserting data into those Kudu (1.1.0) tables using Impala_Kudu (2.7.0-1) on a CDH 5.7.0 cluster.
We're also seeing way more disk I/O being generated by Kudu than the amount of data we are inserting by quite a large factor. But we're still looking into that.
Edit: in addition, the --block_manager setting according to the flagz page is on "log" and hasn't been changed (as far as we know) since installation. We're running on an ext4 filesystem.
11-24-2016 11:04 AM
11-24-2016 11:43 AM
The block_manager flag shows
on all tablet servers.
Upon reboot, when it appears Kudu is reading/checking all it's tablet data on disk we see the open file descriptors go from basically 0 to about 40k a server relatively quick.
The graph showing open file descriptors appears to follow the same path as the Total Tablet Size On Disk Across Kudu Replicas graph upon reboot.
The reboot process takes about 45/60 minutes at least before all tablets show a healthy state (running kudu cluster ksck) . I'll do a reboot maybe tonight to show the graph and numbers.
I ran the ls -la command in one of the datadirs on one of the servers, I've uploaded it to Dropbox since it's a bit much to copy paste here. If you need more I can run it on all data dirs into one file.
11-24-2016 11:56 AM
You are running smack dab into one of the scalability issues currently facing Kudu. The log block manager stores data in containers (a pair of .data and .metadata files). Each container is allowed to grow up to 10 GB before it is considered "full" and a new container is created for additional data. When Kudu needs to delete a block of data, it "punches a hole" in a container, allowing the filesystem to reclaim that space. Since the space was reclaimed the hole is never actually reused by Kudu.
The net result: for long-lived deployments such as yours, the number of small containers (and thus open file descriptors) grows unbounded. We're addressing this in two ways:
1) Decoupling the container growth from file descriptor growth using a cache. This has been implemented and is out for review (https://gerrit.cloudera.org/5146 and https://gerrit.cloudera.org/#/c/5147). If you're comfortable rebuilding Kudu from source, you can apply those two patches, rebuild the Kudu tserver, and give them a shot. You should see each tserver using a fixed number of file descriptors henceforth.
2) Compacting really tiny containers (such as those <1MB) and allowing them to be reused. This should reduce the total number of containers on disk without (hopefully) increasing write amplification too much. It has yet to be implemented.
Long story short: help is on the way! At the very least, the Kudu 1.2 release will include a fix for the first issue.