Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

between and greatest used in pyspark dataframe

Highlighted

between and greatest used in pyspark dataframe

New Contributor

I am writing a pyspark code where I am creating multiple data frames and using these to build subsequent data frames. I am using a filter that goes like:

to_date(transaction_date) BETWEEN greatest(to_date(loco_contract_start_date), to_date(status_asg_start_date)) \ AND LEAST(to_date(loco_contract_end_date), to_date(status_asg_end_date))

and this is where I think the problem is. every time the dataframe is giving me different counts.

Any idea what might be the solution?

Regards

Souveek

Don't have an account?
Coming from Hortonworks? Activate your account here