Created 09-28-2023 07:23 AM
Impala version : impalad version 2.12.0-cdh5.16.2
As shown above, I have two tables; Customer table and Calendar Table, wich columns I describe below:
Customer Table:
[Customer] : Customer ID
[Date] : Creation date
[Num_Days] : Number of working days to calculate the [next_wkday] in the query result.
Calendar Table:
[civil_util] : when (1) -> workday; when (0) -> non working day
I need to get the query_result with the three columns of Customer Table and a calculated date [next_wkday], representing the number of working days [Num_Days] after the Creation date, jumping the zeros in the [civil_util] column.
The query below calculates the [next_wkday] using the Lead() function. But is not a solution because the offset parameter must be a constant, and we need to use the [Num_Days] value for each Creation Date:
select *,
            Lead (to_date(ref_date),5) OVER (ORDER BY to_date(ref_date)) AS next_wkday,
            datediff(Lead (to_date(ref_date),5) OVER (ORDER BY 
 to_date(ref_date)),ref_date) as days_diff
        from cd_estruturais.calendario_datas
        where
            ref_date >= to_date(now())
            and civil_util = 1
            limit 1
So, we need to find another solution.
I need to say, for performance puposes, that the Customer table has 4 billion records approximatly.
Can anyone help please?
Created 09-28-2023 09:41 AM
@s300570 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our experts @bbreak @cravani  who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created on 09-29-2023 01:02 AM - edited 09-29-2023 03:46 AM
My best regards to you, Diana.
I'll just keep waiting for help from those experts, or there must be any action from me?
Leopoldo Fernandes
Portugal
Created 09-29-2023 11:43 AM
@JoseManuel @Shmoo Any insights here?
Regards,
Diana Torres,Created 10-09-2023 04:05 AM
Hi, Can anyone help please?
Created 10-09-2023 06:22 AM
@s300570,
If I understood your requirement correctly, you might try something like:
WITH WorkingDaysCTE AS (
  SELECT
    ref_date,
    ROW_NUMBER() OVER (ORDER BY ref_date) AS row_num
  FROM
    cd_estruturais.calendario_datas
  WHERE
    ref_date >= CURRENT_DATE -- You can adjust the starting date as needed
    AND civil_util = 1
)
SELECT
  c1.*,
  w1.ref_date AS next_wkday,
  DATEDIFF(w1.ref_date, c1.ref_date) AS Num_Days
FROM
  cd_estruturais.calendario_datas c1
JOIN
  WorkingDaysCTE w1 ON c1.ref_date <= w1.ref_date
                    AND w1.row_num = (
                      SELECT MIN(row_num)
                      FROM WorkingDaysCTE w2
                      WHERE w2.row_num > w1.row_num
                    )
WHERE
  c1.ref_date >= CURRENT_DATE
  AND c1.civil_util = 1
LIMIT 1;
Now, just as info, as I no longer have access to an Impala/Hive system, the above query was written as a standard SQL, so you might want to double check the syntax.
In terms of explanations:
Created 10-12-2023 11:02 AM
@s300570 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres, 
					
				
				
			
		
