Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to install RHadoop on CDH5.3

avatar
Explorer

hi,

i have configured a cluster with 4nodes on CDH5.3 and now i want to install RHadoop on that cluster but i didn't find documentation on how to do that.

 

Can you provide me some details ?

 

Best regards,

 

1 ACCEPTED SOLUTION

avatar
Explorer
10 REPLIES 10

avatar
Explorer

avatar
Master Collaborator

I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward.

 

It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts.

 

I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites:

 

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

avatar
Explorer

I am working on installing R and RStudio on CDH-5.3.2, but I found one issue when I install rmr2

install.packages("/home/ec2-user/R/rmr2_3.3.1.tar.gz", repos = NULL, type="source")

 

[javac] /tmp/RtmpVvuf0G/R.INSTALL341d4f985503/rmr2/src/hbase-io/src/java/com/dappervision/hbase/mapred/TypedBytesTableInputFormatBase.java:164: error: cannot find symbol
[javac] String regionLocation = table.getRegionLocation(startKeys[startPos]).
[javac] ^
[javac] symbol: method getServerAddress()
[javac] location: class HRegionLocation

 

In the source code, line 164 is like this: 

      String regionLocation = table.getRegionLocation(startKeys[startPos]).

        getServerAddress().getHostname();

 

I searched API and could not find method getServerAddress() for HRegionLocation.

 

The problem is that I download rmr2 from this link https://github.com/RevolutionAnalytics/RHadoop/wiki  (as in this instruction: https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e... So the issue could be this tar.gz file is for CDH-4. 

Do you know where can I download source code for CDH-5 ?

 

thanks

Bin

avatar
Master Collaborator

I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3.

 

The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find.

 

I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes.

 

Basically you just...

  export HADOOP_CMD=`which hadoop`
  R
  ...
  library(plyrmr)

and go to it.


HOW TO

Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.

For each node:

  export HADOOP_CMD=`which hadoop`
  export HADOOP_STREAMING=`ls
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming-*.jar`

As root, install R:

  yum install R

This installs version 3.0.2 on my cluster. Run R to install some dependencies

  R --vanilla

Once in R:

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

(choose a mirror that's local when you are prompted)

Install packages, back on the command line:

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz

avatar
Explorer

sorry, the link is https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e...

The rmr2 and rhdfs version I downloaded are: 

-rw-r--r-- 1 ec2-user ec2-user    28287 Apr 10 18:24 plyrmr_0.6.0.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    25105 Apr 10 18:24 rhdfs_1.0.8.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    63087 Apr 10 18:24 rmr2_3.3.1.tar.gz

 

And my 5 nodes cluster in EC2:

[root@ip-172-30-2-9 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

avatar
Explorer

Hello,

 

These directions are good.

 

But when i try to install

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz

 


I get an error [root@hostname:username]# R CMD INSTALL rmr2_3.1.0.tar.gz Error in getOctD(x, offset, len) : invalid octal digit

 

 

 

Spoiler
tar -tvf rmr2_3.1.0.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

https://github.com/RevolutionAnalytics/rmr2/tree/enterprise/build

 

I have tried to use different reppos but im at a loss.

 

Any thoughts would be help full.

 

 

 

 

 

avatar
Master Collaborator

I suspect it's some issue in the version of tar you may have on your system? BSD vs Gnu? Just a guess. That or maybe a corrupted file? The latest rmr2 archive uncompressed OK for me on OS X. https://github.com/RevolutionAnalytics/rmr2/releases

avatar
Explorer
Found the right repo. Thanks!

avatar
Explorer

Hi,

When I try to install rmr2_3.3.1.tar.gz into to CDH5.7.4 I am getting following error.  Can you help?

 

Thank you very much,

Garry

 

build_linux.sh: line 163: [: missing `]'
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2 as hadoop home
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hbase as hbase home

Copying libs into local build directory
ls: cannot access /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-*-core.jar: No such file or directory
/ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-core-2.6.0-mr1-cdh5.7.4.jar
Cannot find hadoop-streaming jar in hadoop homei
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    hadoop-setting                          html
    keyval                                  html
    make.io.format                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)