Thanks for the Impala presentation today. I heard a couple of things that interested me.
1) The presenter mentioned early in the presentation that data could be co-located on nodes. Does this mean that Impala / Parquet will be able to intelligently determine the location of data to optimize joins for tables with similar key values?
2) I was interested to know if there were any benchmarks that measured Impala query speed vs. competitors in a mixed workload scenario, i.e. perform queries while data loads are occuring. I was directed to the following URL, but did not find a benchmark that illustrated this test. ( http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ _