Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

between and greatest used in pyspark dataframe


between and greatest used in pyspark dataframe


I am writing a pyspark code where I am creating multiple data frames and using these to build subsequent data frames. I am using a filter that goes like:

to_date(transaction_date) BETWEEN greatest(to_date(loco_contract_start_date), to_date(status_asg_start_date)) \ AND LEAST(to_date(loco_contract_end_date), to_date(status_asg_end_date))

and this is where I think the problem is. every time the dataframe is giving me different counts.

Any idea what might be the solution?



Don't have an account?
Coming from Hortonworks? Activate your account here