- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 05-18-2016 07:32 PM - edited 09-16-2022 01:34 AM
In Data Science, R is commonly used for analytics and data exploration.
When moving to a Hadoop architecture and a connected data platform a big question is what happens to my already existing R scripts?
You can transition nicely to Hadoop using the rHadoop package for R which allows you to read from hdfs and get data back into a dataframe in R.
To enable this you first need to get the R package:
install.packages("rhadoop")
You can also wget the package
wget https://cran.r-project.org/src/contrib/Archive/rHadoopClient/rHadoopClient_0.2.tar.gz
and then
install.packages("/path/to/package")
now you can read a file in using the rHadoopClient:
rHadoopClient::read.hdfs("/path/to/data.csv")
That's all you need to get started.
This allows you to change your file read steps in your R scripts to point to HDFS and still run your R scripts as you are used to doing.
Created on 01-15-2019 05:22 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi Vasilis, does this method you have outlined only work when R is installed on an Edge node of the HDP cluster (i.e. R and HDFS are colocated)? I'm exploring how R (say installed in a workstation) can connect to HDFS running on a separate/remote server(s), in which case, I'm unsure how to define the connection details to Hadoop. Are you able to assist?