Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

LAG, LEAD, and Other Analytic Functions in HiveQL

avatar
Contributor

I have manually installed Cloudera's Hadoop and Cloudera's Hive RPM on RHEL. I have Sqooped data into Hive and can run normal HiveQL queries on the data fine. However, I cannot run LAG or LEAD on it.

 

This site suggests that it is possible: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/ptf-window.html

 

But this site says that Cloudera doesn't provide the version of Hive that comes with LAG, LEAD, etc.: http://www.justinjworkman.com/big-data/hive-0-11-0-on-cloudera/#building

 

Is there any documentation on how to run these types of functions in HiveQL?

 

Thank you.

1 ACCEPTED SOLUTION

avatar
Mentor
These are available in the base version of Apache Hive shipped with CDH5 (beta currently). The CDH4 equivalent is on a stable release launched before the features were added upstream.

You could use Justin's guide to get a custom build of newer version of Hive running on your CDH4 cluster - this is possible to do.

View solution in original post

1 REPLY 1

avatar
Mentor
These are available in the base version of Apache Hive shipped with CDH5 (beta currently). The CDH4 equivalent is on a stable release launched before the features were added upstream.

You could use Justin's guide to get a custom build of newer version of Hive running on your CDH4 cluster - this is possible to do.