Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

LAG, LEAD, and Other Analytic Functions in HiveQL

avatar
Contributor

I have manually installed Cloudera's Hadoop and Cloudera's Hive RPM on RHEL. I have Sqooped data into Hive and can run normal HiveQL queries on the data fine. However, I cannot run LAG or LEAD on it.

 

This site suggests that it is possible: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/ptf-window.html

 

But this site says that Cloudera doesn't provide the version of Hive that comes with LAG, LEAD, etc.: http://www.justinjworkman.com/big-data/hive-0-11-0-on-cloudera/#building

 

Is there any documentation on how to run these types of functions in HiveQL?

 

Thank you.

1 ACCEPTED SOLUTION

avatar
Mentor
These are available in the base version of Apache Hive shipped with CDH5 (beta currently). The CDH4 equivalent is on a stable release launched before the features were added upstream.

You could use Justin's guide to get a custom build of newer version of Hive running on your CDH4 cluster - this is possible to do.

View solution in original post

1 REPLY 1

avatar
Mentor
These are available in the base version of Apache Hive shipped with CDH5 (beta currently). The CDH4 equivalent is on a stable release launched before the features were added upstream.

You could use Justin's guide to get a custom build of newer version of Hive running on your CDH4 cluster - this is possible to do.