- last edited on
How do I compute dataframe record count without re-running dataframe. I mean can we pull this information from any spark stats table?
Few options I am aware of are:
1. dataframe.cache() -- Don't want to store result in memory.
2. dataframe.describe("col").show -- again it will re-run the dataframe to get count.
3. dataframe.count().show() -- again it will re-run the dataframe to get count.
can you try with dataframe.persist(DISK_ONLY)? This stores the result on disk.
Thanks for reply. but it will spend time in writing dataframe into disk. I am looking for an option like is there a way dataframe counts can be displayed into logs, with spark logger option.
intention is to avoid re-running dataframe.