Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Get an end date, (n) workdays from a given date, using a calendar table

avatar
New Contributor

s300570_0-1695908985505.png

Impala version : impalad version 2.12.0-cdh5.16.2

As shown above, I have two tables; Customer table and Calendar Table, wich columns I describe below:

Customer Table:

[Customer] : Customer ID

[Date] : Creation date

[Num_Days] : Number of working days to calculate the [next_wkday] in the query result.

Calendar Table:

[civil_util] : when (1) -> workday; when (0) -> non working day

 

I need to get the query_result with the three columns of Customer Table and a calculated date [next_wkday], representing the number of working days [Num_Days] after the Creation date, jumping the zeros in the [civil_util] column.

 

The query below calculates the [next_wkday] using the Lead() function. But is not a solution because the offset parameter must be a constant, and we need to use the [Num_Days] value for each Creation Date:

 

 

 

select *,
            Lead (to_date(ref_date),5) OVER (ORDER BY to_date(ref_date)) AS next_wkday,
            datediff(Lead (to_date(ref_date),5) OVER (ORDER BY 
 to_date(ref_date)),ref_date) as days_diff
        from cd_estruturais.calendario_datas
        where
            ref_date >= to_date(now())
            and civil_util = 1
            limit 1

 

 

So, we need to find another solution.

 

I need to say, for performance puposes, that the Customer table has 4 billion records approximatly.

 

Can anyone help please?

6 REPLIES 6

avatar
Community Manager

@s300570 Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our experts @bbreak @cravani  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

My best regards to you, Diana.
I'll just keep waiting for help from those experts, or there must be any action from me?

Leopoldo Fernandes
Portugal

avatar
Community Manager

@JoseManuel @Shmoo Any insights here?


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hi, Can anyone help please?

avatar

@s300570,

If I understood your requirement correctly, you might try something like:

WITH WorkingDaysCTE AS (
  SELECT
    ref_date,
    ROW_NUMBER() OVER (ORDER BY ref_date) AS row_num
  FROM
    cd_estruturais.calendario_datas
  WHERE
    ref_date >= CURRENT_DATE -- You can adjust the starting date as needed
    AND civil_util = 1
)

SELECT
  c1.*,
  w1.ref_date AS next_wkday,
  DATEDIFF(w1.ref_date, c1.ref_date) AS Num_Days
FROM
  cd_estruturais.calendario_datas c1
JOIN
  WorkingDaysCTE w1 ON c1.ref_date <= w1.ref_date
                    AND w1.row_num = (
                      SELECT MIN(row_num)
                      FROM WorkingDaysCTE w2
                      WHERE w2.row_num > w1.row_num
                    )
WHERE
  c1.ref_date >= CURRENT_DATE
  AND c1.civil_util = 1
LIMIT 1;


Now, just as info, as I no longer have access to an Impala/Hive system, the above query was written as a standard SQL, so you might want to double check the syntax.

In terms of explanations:

  • The common table expression (CTE) named WorkingDaysCTE is used to filter the dates with civil_util = 1 and assign a row number to each date based on their order.
  • The main query then joins the calendario_datas table with the CTE on the condition that the date in the calendar table is less than or equal to the date in the CTE.
  • The subquery in the SELECT clause is used to find the minimum row number greater than the current row number in the CTE. This helps in getting the next working day.
  • The DATEDIFF function is used to calculate the number of days between the ref_date and the next_wkday.

avatar
Community Manager

@s300570 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: