06-07-2019 01:43 AM
Since Jupyter Notebooks are not compatible with Cloudera 5.14, we are evaluating the use of R Studio in order our Data Scientists can run their programs in Spark. I have seen that it would be necessary R Studio Server to connect to a remote Spark cluster, is it an environment supported by Cloudera? Would it be possible to have R Studio Server outside the cluster? Is there any way to use R Studio Desktop with a remote Spark cluster?
In another sense, if we want to use Python and Spark, will you be supported, for example, using Spyder connected remotely to a kernel running in the Cloudera cluster?
Thanks in advance,
06-07-2019 04:35 AM
By using sparklyr you can connect to spark cluster.
Also, if you are in evaluating/initial phase of deciding which tools to use. Then you may want to also compare Cloudera's Data Science Workbench, if you haven't already done that.
Hope that helps.
06-10-2019 06:50 AM
Thank you @Consult , our idea is to use sparklyr to connect to Spark in our cluster, but using RStudio Desktop or RStudio Server. In our case, RStudio Server is outside the cluster, which steps should we follow to connect to a remote spark cluster?
Cloudera Datascience Workbench is an option we may evaluate in the future, regarding this, is it necessary a separte node(s) for CDS to run? Could it run in an existing edge node?