Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

how to install RHadoop on CDH5.3



i have configured a cluster with 4nodes on CDH5.3 and now i want to install RHadoop on that cluster but i didn't find documentation on how to do that.


Can you provide me some details ?


Best regards,





Master Collaborator

I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward.


It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts.


I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites:


  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))


I am working on installing R and RStudio on CDH-5.3.2, but I found one issue when I install rmr2

install.packages("/home/ec2-user/R/rmr2_3.3.1.tar.gz", repos = NULL, type="source")


[javac] /tmp/RtmpVvuf0G/R.INSTALL341d4f985503/rmr2/src/hbase-io/src/java/com/dappervision/hbase/mapred/ error: cannot find symbol
[javac] String regionLocation = table.getRegionLocation(startKeys[startPos]).
[javac] ^
[javac] symbol: method getServerAddress()
[javac] location: class HRegionLocation


In the source code, line 164 is like this: 

      String regionLocation = table.getRegionLocation(startKeys[startPos]).



I searched API and could not find method getServerAddress() for HRegionLocation.


The problem is that I download rmr2 from this link  (as in this instruction: So the issue could be this tar.gz file is for CDH-4. 

Do you know where can I download source code for CDH-5 ?




Master Collaborator

I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3.


The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find.


I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes.


Basically you just...

  export HADOOP_CMD=`which hadoop`

and go to it.


Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.

For each node:

  export HADOOP_CMD=`which hadoop`

As root, install R:

  yum install R

This installs version 3.0.2 on my cluster. Run R to install some dependencies

  R --vanilla

Once in R:

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

(choose a mirror that's local when you are prompted)

Install packages, back on the command line:

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz


sorry, the link is

The rmr2 and rhdfs version I downloaded are: 

-rw-r--r-- 1 ec2-user ec2-user    28287 Apr 10 18:24 plyrmr_0.6.0.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    25105 Apr 10 18:24 rhdfs_1.0.8.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    63087 Apr 10 18:24 rmr2_3.3.1.tar.gz


And my 5 nodes cluster in EC2:

[root@ip-172-30-2-9 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)




These directions are good.


But when i try to install

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz


I get an error [root@hostname:username]# R CMD INSTALL rmr2_3.1.0.tar.gz Error in getOctD(x, offset, len) : invalid octal digit




tar -tvf rmr2_3.1.0.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now


I have tried to use different reppos but im at a loss.


Any thoughts would be help full.






Master Collaborator

I suspect it's some issue in the version of tar you may have on your system? BSD vs Gnu? Just a guess. That or maybe a corrupted file? The latest rmr2 archive uncompressed OK for me on OS X.

Found the right repo. Thanks!



When I try to install rmr2_3.3.1.tar.gz into to CDH5.7.4 I am getting following error.  Can you help?


Thank you very much,

Garry line 163: [: missing `]'
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2 as hadoop home
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hbase as hbase home

Copying libs into local build directory
ls: cannot access /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-*-core.jar: No such file or directory
Cannot find hadoop-streaming jar in hadoop homei
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    hadoop-setting                          html
    keyval                                  html                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)