Community Articles

Find and share helpful community-sourced technical articles.
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

In Data Science, R is commonly used for analytics and data exploration.

When moving to a Hadoop architecture and a connected data platform a big question is what happens to my already existing R scripts?

You can transition nicely to Hadoop using the rHadoop package for R which allows you to read from hdfs and get data back into a dataframe in R.

To enable this you first need to get the R package:


You can also wget the package


and then


now you can read a file in using the rHadoopClient:


That's all you need to get started.

This allows you to change your file read steps in your R scripts to point to HDFS and still run your R scripts as you are used to doing.

0 Kudos
Not applicable

Hi Vasilis, does this method you have outlined only work when R is installed on an Edge node of the HDP cluster (i.e. R and HDFS are colocated)? I'm exploring how R (say installed in a workstation) can connect to HDFS running on a separate/remote server(s), in which case, I'm unsure how to define the connection details to Hadoop. Are you able to assist?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎09-16-2022 01:34 AM
Updated by:
Guru Guru
Top Kudoed Authors