Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

how to install RHadoop on CDH5.3

Explorer

hi,

i have configured a cluster with 4nodes on CDH5.3 and now i want to install RHadoop on that cluster but i didn't find documentation on how to do that.

 

Can you provide me some details ?

 

Best regards,

 

1 ACCEPTED SOLUTION

Explorer
10 REPLIES 10

Explorer

Master Collaborator

I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward.

 

It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts.

 

I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites:

 

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

Explorer

I am working on installing R and RStudio on CDH-5.3.2, but I found one issue when I install rmr2

install.packages("/home/ec2-user/R/rmr2_3.3.1.tar.gz", repos = NULL, type="source")

 

[javac] /tmp/RtmpVvuf0G/R.INSTALL341d4f985503/rmr2/src/hbase-io/src/java/com/dappervision/hbase/mapred/TypedBytesTableInputFormatBase.java:164: error: cannot find symbol
[javac] String regionLocation = table.getRegionLocation(startKeys[startPos]).
[javac] ^
[javac] symbol: method getServerAddress()
[javac] location: class HRegionLocation

 

In the source code, line 164 is like this: 

      String regionLocation = table.getRegionLocation(startKeys[startPos]).

        getServerAddress().getHostname();

 

I searched API and could not find method getServerAddress() for HRegionLocation.

 

The problem is that I download rmr2 from this link https://github.com/RevolutionAnalytics/RHadoop/wiki  (as in this instruction: https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e... So the issue could be this tar.gz file is for CDH-4. 

Do you know where can I download source code for CDH-5 ?

 

thanks

Bin

Master Collaborator

I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3.

 

The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find.

 

I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes.

 

Basically you just...

  export HADOOP_CMD=`which hadoop`
  R
  ...
  library(plyrmr)

and go to it.


HOW TO

Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.

For each node:

  export HADOOP_CMD=`which hadoop`
  export HADOOP_STREAMING=`ls
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming-*.jar`

As root, install R:

  yum install R

This installs version 3.0.2 on my cluster. Run R to install some dependencies

  R --vanilla

Once in R:

  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))

(choose a mirror that's local when you are prompted)

Install packages, back on the command line:

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz

Explorer

sorry, the link is https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e...

The rmr2 and rhdfs version I downloaded are: 

-rw-r--r-- 1 ec2-user ec2-user    28287 Apr 10 18:24 plyrmr_0.6.0.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    25105 Apr 10 18:24 rhdfs_1.0.8.tar.gz

-rw-r--r-- 1 ec2-user ec2-user    63087 Apr 10 18:24 rmr2_3.3.1.tar.gz

 

And my 5 nodes cluster in EC2:

[root@ip-172-30-2-9 ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

New Contributor

Hello,

 

These directions are good.

 

But when i try to install

  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz

 


I get an error [root@hostname:username]# R CMD INSTALL rmr2_3.1.0.tar.gz Error in getOctD(x, offset, len) : invalid octal digit

 

 

 

Spoiler
tar -tvf rmr2_3.1.0.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

https://github.com/RevolutionAnalytics/rmr2/tree/enterprise/build

 

I have tried to use different reppos but im at a loss.

 

Any thoughts would be help full.

 

 

 

 

 

Master Collaborator

I suspect it's some issue in the version of tar you may have on your system? BSD vs Gnu? Just a guess. That or maybe a corrupted file? The latest rmr2 archive uncompressed OK for me on OS X. https://github.com/RevolutionAnalytics/rmr2/releases

New Contributor
Found the right repo. Thanks!

Explorer

Hi,

When I try to install rmr2_3.3.1.tar.gz into to CDH5.7.4 I am getting following error.  Can you help?

 

Thank you very much,

Garry

 

build_linux.sh: line 163: [: missing `]'
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2 as hadoop home
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hbase as hbase home

Copying libs into local build directory
ls: cannot access /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-*-core.jar: No such file or directory
/ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-core-2.6.0-mr1-cdh5.7.4.jar
Cannot find hadoop-streaming jar in hadoop homei
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    hadoop-setting                          html
    keyval                                  html
    make.io.format                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)

New Contributor

Hi,

 

I am following steps from the following link for RHadoop installation on cloudera

https://ashokharnal.wordpress.com/2013/08/25/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e...

 

Will it work for cloudera 1.6?

 

Thanks.

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.