We are trying to perform a spark-connect using RStudio desktop and we cannot figure out how to configure the connection parameters.
master |
Spark cluster url to connect to. Use "local" to connect to a local instance of Spark installed via spark_install. |
spark_home |
The path to a Spark installation. Defaults to the path provided by the SPARK_HOME environment variable. If SPARK_HOME is defined, it will always be used unless the version parameter is specified to force the use of a locally installed version. |
method |
The method used to connect to Spark. Default connection method is "shell" to connect using spark-submit, use "livy" to perform remote connections using HTTP, or "databricks" when using a Databricks clusters. |
and
config |
Custom configuration for the generated Spark connection. |
We fail with
Error in system2(file.path(spark_home, "bin", "spark-submit"), "--version", :
'"/opt/cloudera/parcels/SPARK2/bin/spark-submit"' not found
And we receive the same error when we configure SPARK_HOME to point to spark 1.6