New Contributor
Posts: 1
Registered: ‎03-14-2018

Take vs Count performance

I was going through some pages for Spark practices and found this page:


If I had to check if a dataframe has at least 10 entries, would it be better to do df.count() >= 10 or df.take(10).length <10 ?


I tried both methods and didn't find there to be a difference in performance, so I'm wondering what the logic is behind that post I linked?


These are large dataframes that have some fairly complex transformations before the count/take is called, so the count/take action can take a very long time to complete.