Support Questions

Find answers, ask questions, and share your expertise

Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

avatar

I was wondering how to do an approx count of a dataframe without converting to an rdd in spark 1.6.

Is there a possible hack or not.

If anyone has any solutions please let me know thanks.

1 ACCEPTED SOLUTION

avatar

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

View solution in original post

2 REPLIES 2

avatar

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

avatar

Thanks this is I what I use but I wish there was one just for the dataframe specifically.