28798
DISCUSSIONS
102165
MEMBERS
3161
ARTICLES
Created 02-10-2017 03:06 PM
In doing benchmarks between Impala and PrestoDB, we noticed that building pivot tables is quite difficult in Imapala because it does not have Cube operators, like Presto does. Here are two examples in Presto:
The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`
SELECT origin_state, destination_state, sum(package_weight) FROM shipping GROUP BY CUBE (origin_state, destination_state);
is equivalent to:
SELECT origin_state, destination_state, sum(package_weight) FROM shipping GROUP BY GROUPING SETS ( (origin_state, destination_state), (origin_state), (destination_state), ());
Another example is the ROLLUP operator. Full documentation is here:https://prestodb.io/docs/current/sql/select.html.
It's not syntatic sugar because PRESTO perform one table scan for whole query - so using this operators you can build pivot table in one request Impala need to run 2-3 queries.
Is there a way in which we can do this with one query / table-scan in Impala instaead of 3? Otherwise the performance becomes terrible on creating any type of pivot table.