Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala sliding window function

Impala sliding window function

New Contributor

I have a data set with two fields - a date and a user name.  I want to find two different things.

  • For each month, how many users are brand new and have never appeared in the data before?
  • For each month, how many users have not been appeared in the past 12 months?

I'm familiar with window functions and aggregates, but can't wrap my brain around this one for some reason.  Any help appreciated.

 

Thank you!

1 REPLY 1

Re: Impala sliding window function

Master Collaborator
One idea is to construct the cartesian product of users and months, then
left join it with the user/date table on user+month. Maybe you want to
aggregate your base table by month first...

Then I think you could have a last_value(date) window function partitioned
by user and ordered by date. You can then check whether that date is NULL
or if it's < 12 months before that month.

Maybe this isn't the most efficient way but I think it could be made to
work.