Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

groupBy and filter data in pyspark


I need to group by date and count null on primary key in pyspark


@Gundrathi babu

you can try it with groupBy and filter in pyspark which you have mentioned in your questions.


grp = df.groupBy("id").count(1)

fil = grp.filter(lambda grp : '' in grp)

fil will have the result with count. Hope it helps!! This is how you have to workout I dont have running spark cluster in handy to verify the code. But this flow should help you out to solve the issue.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.