Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Assigning aggregate value from a pySpark Query/data frame to a variable

Assigning aggregate value from a pySpark Query/data frame to a variable

New Contributor

We have a requirement in pySpark where an aggregated value from a SQL query is to be stored in a variable and that variable is used for SELECTion criteria in subsequent query. For ex: get the max(sales_date) and get the data from table for that date. Can you please share some inputs on this?

Regards,

Phaneendra

1 REPLY 1
Highlighted

Re: Assigning aggregate value from a pySpark Query/data frame to a variable

Rising Star

You can retrieve the value of the aggregate query like this:

aggr_value = df.select("your query").collect()[0][0]

then you can use it in the next queries as any variable.

Don't have an account?
Coming from Hortonworks? Activate your account here