Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

Solved Go to solution

Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

I was wondering how to do an approx count of a dataframe without converting to an rdd in spark 1.6.

Is there a possible hack or not.

If anyone has any solutions please let me know thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

View solution in original post

2 REPLIES 2
Highlighted

Re: Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.

View solution in original post

Highlighted

Re: Is there a way to do a count Approx for a dataframe (not rdd)in spark 1.6

Thanks this is I what I use but I wish there was one just for the dataframe specifically.

Don't have an account?
Coming from Hortonworks? Activate your account here