Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Percentile aggregation

Percentile aggregation

New Contributor

We're using a small (6-node) hdfs/impala cluster for development of an analytics project. The one feature we're really missing is percentile aggregations, so alongside avg, min, max, std_dev values we can also see the 95th and 99th percentiles for grouped results.

 

In the latest What's next for Impala... blog post (July 2015) the "...addition of new SQL and vendor-specific language extensions and data types based on customer feedback" is planned for later in 2016. Are percentile aggregation functions intended to be included, equivalent to Hive's percentile() or percentile_approx() functions?

 

Alternatively, since Impala 2.3 - released in November - User-Defined Aggregation Functions (UDAFs) have been available. Has anyone written a percentile aggregation function that they would be happy to share or open source?

 

Does anyone use a third-party technology / library to calculate percentiles with Impala?

 

Many thanks for any help you can provide.

1 REPLY 1

Re: Percentile aggregation

New Contributor