As I understand - "SparkR is just an interface to write your own Spark programs in R" - Does this imply spark does not have its own R packages/libraries but needs to fallback on other R implmentations like cran.R. This means we need to write R programs in sparkR and call libraries from cran.R. Our problem is for text classificaton "maxEntropy" provides higher accuracy than "naiveBayes" - "maxEntropy" is not available in mllib and only option is to use library from R ("RTextTools"). Any example from spark invoking R library/package will be useful.
Below link says "includePackage" is an option - can this be used by sparkR to add "RTextTools" cran package?
using existing R packages
SparkR also allows easy use of existing R packages inside closures. The includePackagecommand can be used to indicate packages that should be loaded before every closure is executed on the cluster. For example to use the Matrix in a closure applied on each partition of an RDD, you could run
As you have mentioned sparkR is just an interface -> that means sparkR by itself has no packages and has to depend on other implementations like "cran.R" - correct.
Another defination says
SparkR is an R package provides light-weight frontend to use apache spark from R. SparkR allows easy use of existing R packages inside closures. Spark computations are automatically distributed across all the cores and machines available on the Spark cluster, so this package can be used to analyze terabytes of data using R.