Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

groupBy and filter data in pyspark

Explorer

I need to group by date and count null on primary key in pyspark

1 REPLY 1

@Gundrathi babu

you can try it with groupBy and filter in pyspark which you have mentioned in your questions.

Sample:

grp = df.groupBy("id").count(1)

fil = grp.filter(lambda grp : '' in grp)

fil will have the result with count. Hope it helps!! This is how you have to workout I dont have running spark cluster in handy to verify the code. But this flow should help you out to solve the issue.