Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

avatar
Not applicable

I was wondering how to do an approx count of a dataframe without converting to an rdd in spark 1.6.

Is there a possible hack or not.

If anyone has any solutions please let me know thanks.

1 ACCEPTED SOLUTION

avatar

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

View solution in original post

2 REPLIES 2

avatar

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

avatar
Not applicable

Thanks this is I what I use but I wish there was one just for the dataframe specifically.