Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala multiple table scans when doing pivot

Highlighted

Impala multiple table scans when doing pivot

New Contributor

In doing benchmarks between Impala and PrestoDB, we noticed that building pivot tables is quite difficult in Imapala because it does not have Cube operators, like Presto does. Here are two examples in Presto:

 

The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);


is equivalent to:

SELECT origin_state, destination_state, sum(package_weight) FROM shipping GROUP BY GROUPING SETS ( (origin_state, destination_state), (origin_state), (destination_state), ());


Another example is the ROLLUP operator. Full documentation is here:https://prestodb.io/docs/current/sql/select.html.


It's not syntatic sugar because PRESTO perform one table scan for whole query - so using this operators you can build pivot table in one request Impala need to run 2-3 queries.

 

Is there a way in which we can do this with one query / table-scan in Impala instaead of 3? Otherwise the performance becomes terrible on creating any type of pivot table.