I stumbled upon Todd Lipcon saying regarding Kudu:
"However, we put a lot of effort into ensuring that adding and dropping columns is very fast and low-impact – right now it blocks inserts for a couple of seconds on each server, one at a time, but we have a design that we can move to later which would only block for a couple hundred microseconds."
this quote is from a comment on an article from September 2015 (you can find it here)
Do you know if those improvements ever happend?
I'm evaluating kudu for one of our use cases at the moment and as our schema tend to change a lot I want to make sure that it supports dropping/adding of columns fast (event on 100TB table with thousands of partitions) and without affecting availability and ongoing queries.
Does anyone have experience or some insights about this use case?
Unfortunately we have not prioritized speeding up ALTER TABLE operations as mentioned in the post you quoted.
We've found that for most users and customers, ALTER TABLE is an infrequent operation, so the few seconds it typically takes is not problematic. I'd be interested to learn more about your use case, though, to help us prioritize.
I'm working on building a fast serving layer for our data. we need to let our customers query our data with sub-second latency and hoped to use kudu+impala to achieve that. However, our data tend to be very dynamic and new fields are created all the time (In one use case it even triggered by customer's operation). So I need to make sure that adding a new column does not mess up with data ingestion (which is very high rate) and customers queries.
could you please explain the affects of adding a new column?