question Re: kudu is slower than parquet? in Support Questions

question Re: kudu is slower than parquet? in Support Questions https://community.cloudera.com/t5/Support-Questions/kudu-is-slower-than-parquet/m-p/56504#M14061 We'd expect Kudu to be slower than Parquet on a pure read benchmark, but not 10x slower - that may be a configuration problem. We've published results on the Cloudera blog before that demonstrate this: <A href="http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/ " target="_blank">http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/ </A> Parquet is a read-only storage format while Kudu supports row-level updates so they make different trade-offs. I think we have headroom to significantly improve the performance of both table formats in Impala over time.E.g. in Impala 2.9/CDH5.12 <A href="http://issues.cloudera.org/browse/IMPALA-5347" target="_blank">IMPALA-5347</A> and IMPALA-5304 improve pure Parquet scan performance by 50%+ on some workloads, and I think there are probably similar opportunities for Kudu.  Mon, 26 Jun 2017 15:41:17 GMT Tim Armstrong 2017-06-26T15:41:17Z