Created 06-13-2017 05:09 PM
I was wondering how to do an approx count of a dataframe without converting to an rdd in spark 1.6.
Is there a possible hack or not.
If anyone has any solutions please let me know thanks.
Created 07-07-2017 03:59 PM
@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.
Created 07-07-2017 03:59 PM
@elliot gimple I know it's not really what you want but there's an .rdd method you can call on a DataFrame in 1.6 so you could just do `df.rdd.countApprox()` on that. I'd have to look at the DAG more closely but I think the overhead is just going to be in converting DataFrame elements to Rows and not generation of the full RDD before `countApprox` is called -- not 100% sure about that though.
Created 07-18-2017 05:10 PM
Thanks this is I what I use but I wish there was one just for the dataframe specifically.