Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Thanks for the Impala presentation today. I heard a couple of things that interested me.

Highlighted

Thanks for the Impala presentation today. I heard a couple of things that interested me.

New Contributor

Thanks for the Impala presentation today. I heard a couple of things that interested me.

 

1) The presenter mentioned early in the presentation that data could be co-located on nodes.  Does this mean that Impala / Parquet will be able to intelligently determine the location of data to optimize joins for tables with similar key values?

 

2) I was interested to know if there were any benchmarks that measured Impala query speed vs. competitors in a mixed workload scenario, i.e. perform queries while data loads are occuring. I was directed to the following URL, but did not find a benchmark that illustrated this test. ( http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/ _

1 REPLY 1

Re: Thanks for the Impala presentation today. I heard a couple of things that interested me.

Master Guru
1 - IIUC, the data-and-compute co-location helps the IO and caching efforts that brings speedups to all form of queries. I'll let other Impala experts comment on if joins are specially impacted as well.

2 - The blog post mentions the set of queries that was used, and its repo is: https://github.com/cloudera/impala-tpcds-kit. Is this what you're looking for?