Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Impala multiple table scans when doing pivot

avatar
New Contributor

In doing benchmarks between Impala and PrestoDB, we noticed that building pivot tables is quite difficult in Imapala because it does not have Cube operators, like Presto does. Here are two examples in Presto:

 

The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);


is equivalent to:

SELECT origin_state, destination_state, sum(package_weight) FROM shipping GROUP BY GROUPING SETS ( (origin_state, destination_state), (origin_state), (destination_state), ());


Another example is the ROLLUP operator. Full documentation is here:https://prestodb.io/docs/current/sql/select.html.


It's not syntatic sugar because PRESTO perform one table scan for whole query - so using this operators you can build pivot table in one request Impala need to run 2-3 queries.

 

Is there a way in which we can do this with one query / table-scan in Impala instaead of 3? Otherwise the performance becomes terrible on creating any type of pivot table.

Who agreed with this topic