05-11-2017 09:01 AM
I have an application server with RHEL and Rstudio Server Pro that I manage.
All I need to do for now is to use R to read/write data from our Hadoop cluster into R to process further on my application server. I found a lot of info on the internet on how to run R on the Hadoop nodes but that is not what I am looking for right now.
Any ideas? What drivers and packages do I need to install to make this happen?
09-06-2017 02:55 AM
I understand that you would like to interact with data on cluster within R.
One idea is to use HttpFS with R curl package:
Apache Hadoop HttpFS is a service that provides HTTP access to HDFS.
HttpFS has a REST HTTP API supporting all HDFS filesystem operations (both read and write).
Common HttpFS use cases are:
A more ad-hoc solution would be to use the Cloudera Data Science Workbench. Have you given it a try?