Reply
New Contributor
Posts: 1
Registered: ‎04-09-2014

Thanks for the Impala presentation today. I heard a couple of things that interested me.

Thanks for the Impala presentation today. I heard a couple of things that interested me.

 

1) The presenter mentioned early in the presentation that data could be co-located on nodes.  Does this mean that Impala / Parquet will be able to intelligently determine the location of data to optimize joins for tables with similar key values?

 

2) I was interested to know if there were any benchmarks that measured Impala query speed vs. competitors in a mixed workload scenario, i.e. perform queries while data loads are occuring. I was directed to the following URL, but did not find a benchmark that illustrated this test. ( http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ _

Highlighted
Posts: 1,880
Kudos: 422
Solutions: 297
Registered: ‎07-31-2013

Re: Thanks for the Impala presentation today. I heard a couple of things that interested me.

1 - IIUC, the data-and-compute co-location helps the IO and caching efforts that brings speedups to all form of queries. I'll let other Impala experts comment on if joins are specially impacted as well.

2 - The blog post mentions the set of queries that was used, and its repo is: https://github.com/cloudera/impala-tpcds-kit. Is this what you're looking for?