We have a requirement in pySpark where an aggregated value from a SQL query is to be stored in a variable and that variable is used for SELECTion criteria in subsequent query. For ex: get the max(sales_date) and get the data from table for that date. Can you please share some inputs on this?
You can retrieve the value of the aggregate query like this:
aggr_value = df.select("your query").collect()
then you can use it in the next queries as any variable.