Support Questions

Find answers, ask questions, and share your expertise

Some mathematical (e.g. atan2) and analytical functions (e.g. WIDTH_BUCKET) are not available in hive , which is availabe in other databases . Could you recommend some open source libraries that I can leverage?

avatar

Some mathematical (e.g. atan2) and analytical functions (e.g. WIDTH_BUCKET) are not available in hive , which is availabe in other databases . Could you recommend some open source libraries that I can leverage?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Mahipal Ramidi

As Sindhu suggested, you can write your UDF, specifically leveraging Java math library. NTILE divides ordered data set into number of buckets and assigns appropriate bucket number to each row. It can be used to divide rows into equal sets and assign a number to each row. WIDTH_BUCKET, while not far off from NTILE, here we can actually supply the range of values (start and end values), it takes the ranges and splits it into N groups.

You can actually write WIDTH_BUCKET in SQL if you know your number of buckets and assume that is static for your histogram. Her is an example:

SELECT whatever, CASE WHEN(whatever)>=${hiveconf:mymin} AND (t.tiv)<=${hiveconf:mymax}

THEN CASEWHENfloor((whatever)/((${hiveconf:mymax}–

${hiveconf:mymin})/${ hiveconf:mybuckets} )+ 1)>${ hiveconf:mybuckets}

THEN floor((t.tiv)/((${hiveconf:mymax}-${ hiveconf:mymin})/${ hiveconf:mybuckets})) ELSE floor((whatever)/((${hiveconf:mymax}-

${hiveconf:mymin})/${hiveconf:mybuckets})+1) END ELSE(${hiveconf:mybuckets})+1 END AS whateverlabel FROM(whatever table or sql)

Even is not exactly WIDTH_BUCKET, https://developers.google.com/chart/interactive/docs/gallery/histogram provides a bucketing javascript function useful for histograms. Check the histograms section.

Another good resource: https://developers.google.com/api-client-library/java/apis/analytics/v3

I believe that I knew a good Java library that had WIDTH_BUCKET among other analytical functions, but look at Google resources mentioned above. Most likely, you could leverage those and add yours, custom UDFs.

View solution in original post

2 REPLIES 2

avatar
@Mahipal Ramidi

You can have a Hive UDF to achieve the functions. You can alternatively use NTILE which is supported in Hive.

Please refer to below link for current list of Analytical functions:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

avatar
Super Guru

@Mahipal Ramidi

As Sindhu suggested, you can write your UDF, specifically leveraging Java math library. NTILE divides ordered data set into number of buckets and assigns appropriate bucket number to each row. It can be used to divide rows into equal sets and assign a number to each row. WIDTH_BUCKET, while not far off from NTILE, here we can actually supply the range of values (start and end values), it takes the ranges and splits it into N groups.

You can actually write WIDTH_BUCKET in SQL if you know your number of buckets and assume that is static for your histogram. Her is an example:

SELECT whatever, CASE WHEN(whatever)>=${hiveconf:mymin} AND (t.tiv)<=${hiveconf:mymax}

THEN CASEWHENfloor((whatever)/((${hiveconf:mymax}–

${hiveconf:mymin})/${ hiveconf:mybuckets} )+ 1)>${ hiveconf:mybuckets}

THEN floor((t.tiv)/((${hiveconf:mymax}-${ hiveconf:mymin})/${ hiveconf:mybuckets})) ELSE floor((whatever)/((${hiveconf:mymax}-

${hiveconf:mymin})/${hiveconf:mybuckets})+1) END ELSE(${hiveconf:mybuckets})+1 END AS whateverlabel FROM(whatever table or sql)

Even is not exactly WIDTH_BUCKET, https://developers.google.com/chart/interactive/docs/gallery/histogram provides a bucketing javascript function useful for histograms. Check the histograms section.

Another good resource: https://developers.google.com/api-client-library/java/apis/analytics/v3

I believe that I knew a good Java library that had WIDTH_BUCKET among other analytical functions, but look at Google resources mentioned above. Most likely, you could leverage those and add yours, custom UDFs.