Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

how to install RHadoop on CDH5.3



i have configured a cluster with 4nodes on CDH5.3 and now i want to install RHadoop on that cluster but i didn't find documentation on how to do that.


Can you provide me some details ?


Best regards,





Master Collaborator

I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward.


It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts.


I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites:


  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))


I am working on installing R and RStudio on CDH-5.3.2, but I found one issue when I install rmr2

install.packages("/home/ec2-user/R/rmr2_3.3.1.tar.gz", repos = NULL, type="source")


[javac] /tmp/RtmpVvuf0G/R.INSTALL341d4f985503/rmr2/src/hbase-io/src/java/com/dappervision/hbase/mapred/ error: cannot find symbol
[javac] String regionLocation = table.getRegionLocation(startKeys[startPos]).
[javac] ^
[javac] symbol: method getServerAddress()
[javac] location: class HRegionLocation


In the source code, line 164 is like this: 

      String regionLocation = table.getRegionLocation(startKeys[startPos]).



I searched API and could not find method getServerAddress() for HRegionLocation.


The problem is that I download rmr2 from this link  (as in this instruction: So the issue could be this tar.gz file is for CDH-4. 

Do you know where can I download source code for CDH-5 ?




Master Collaborator

I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3.


The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find.


I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes.


Basically you just...

  export HADOOP_CMD=`which hadoop`

and go to it.


Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.

For each node:

  export HADOOP_CMD=`which hadoop`

As root, install R:

  yum install R

This installs version 3.0.2 on my cluster. Run R to install some dependencies

  R --vanilla

Once in R:

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

(choose a mirror that's local when you are prompted)

Install packages, back on the command line:

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz


sorry, the link is

The rmr2 and rhdfs version I downloaded are: 

-rw-r--r-- 1 ec2-user ec2-user    28287 Apr 10 18:24 plyrmr_0.6.0.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    25105 Apr 10 18:24 rhdfs_1.0.8.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    63087 Apr 10 18:24 rmr2_3.3.1.tar.gz


And my 5 nodes cluster in EC2:

[root@ip-172-30-2-9 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

New Contributor



These directions are good.


But when i try to install

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz


I get an error [root@hostname:username]# R CMD INSTALL rmr2_3.1.0.tar.gz Error in getOctD(x, offset, len) : invalid octal digit




tar -tvf rmr2_3.1.0.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now


I have tried to use different reppos but im at a loss.


Any thoughts would be help full.






Master Collaborator

I suspect it's some issue in the version of tar you may have on your system? BSD vs Gnu? Just a guess. That or maybe a corrupted file? The latest rmr2 archive uncompressed OK for me on OS X.

New Contributor
Found the right repo. Thanks!



When I try to install rmr2_3.3.1.tar.gz into to CDH5.7.4 I am getting following error.  Can you help?


Thank you very much,

Garry line 163: [: missing `]'
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2 as hadoop home
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hbase as hbase home

Copying libs into local build directory
ls: cannot access /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-*-core.jar: No such file or directory
Cannot find hadoop-streaming jar in hadoop homei
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    hadoop-setting                          html
    keyval                                  html                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)

New Contributor



I am following steps from the following link for RHadoop installation on cloudera


Will it work for cloudera 1.6?




Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.