hi,
i have configured a cluster with 4nodes on CDH5.3 and now i want to install RHadoop on that cluster but i didn't find documentation on how to do that.
Can you provide me some details ?
Best regards,
Created 03-12-2015 01:54 AM
Hi,
I did find a useful article about that:
Created 03-12-2015 01:54 AM
Hi,
I did find a useful article about that:
Created 03-12-2015 05:47 AM
I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward.
It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts.
I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites:
  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))
Created 04-13-2015 04:07 PM
I am working on installing R and RStudio on CDH-5.3.2, but I found one issue when I install rmr2
install.packages("/home/ec2-user/R/rmr2_3.3.1.tar.gz", repos = NULL, type="source")
[javac] /tmp/RtmpVvuf0G/R.INSTALL341d4f985503/rmr2/src/hbase-io/src/java/com/dappervision/hbase/mapred/TypedBytesTableInputFormatBase.java:164: error: cannot find symbol
[javac] String regionLocation = table.getRegionLocation(startKeys[startPos]).
[javac] ^
[javac] symbol: method getServerAddress()
[javac] location: class HRegionLocation
In the source code, line 164 is like this:
String regionLocation = table.getRegionLocation(startKeys[startPos]).
getServerAddress().getHostname();
I searched API and could not find method getServerAddress() for HRegionLocation.
The problem is that I download rmr2 from this link https://github.com/RevolutionAnalytics/RHadoop/wiki (as in this instruction: https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e... So the issue could be this tar.gz file is for CDH-4.
Do you know where can I download source code for CDH-5 ?
thanks
Bin
Created on 04-13-2015 04:19 PM - edited 04-13-2015 04:20 PM
I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3.
The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find.
I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes.
Basically you just...
  export HADOOP_CMD=`which hadoop`
  R
  ...
  library(plyrmr)
and go to it.
HOW TO
Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.
For each node:
  export HADOOP_CMD=`which hadoop`
  export HADOOP_STREAMING=`ls
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming-*.jar`
As root, install R:
  yum install R
This installs version 3.0.2 on my cluster. Run R to install some dependencies
  R --vanilla
Once in R:
  install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))
(choose a mirror that's local when you are prompted)
Install packages, back on the command line:
  R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
  R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
  R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz
					
				
			
			
				
			
			
			
			
			
			
			
		Created 04-13-2015 04:48 PM
sorry, the link is https://ashokharnal.wordpress.com/2014/01/16/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-e...
The rmr2 and rhdfs version I downloaded are:
-rw-r--r-- 1 ec2-user ec2-user 28287 Apr 10 18:24 plyrmr_0.6.0.tar.gz
-rw-r--r-- 1 ec2-user ec2-user 25105 Apr 10 18:24 rhdfs_1.0.8.tar.gz
-rw-r--r-- 1 ec2-user ec2-user 63087 Apr 10 18:24 rmr2_3.3.1.tar.gz
And my 5 nodes cluster in EC2:
[root@ip-172-30-2-9 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
Created 07-27-2015 01:26 PM
Hello,
These directions are good.
But when i try to install
R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz
I get an error [root@hostname:username]# R CMD INSTALL rmr2_3.1.0.tar.gz Error in getOctD(x, offset, len) : invalid octal digit
https://github.com/RevolutionAnalytics/rmr2/tree/enterprise/build
I have tried to use different reppos but im at a loss.
Any thoughts would be help full.
Created 07-27-2015 01:41 PM
I suspect it's some issue in the version of tar you may have on your system? BSD vs Gnu? Just a guess. That or maybe a corrupted file? The latest rmr2 archive uncompressed OK for me on OS X. https://github.com/RevolutionAnalytics/rmr2/releases
Created 07-27-2015 03:52 PM
Created 11-08-2016 06:51 AM
Hi,
When I try to install rmr2_3.3.1.tar.gz into to CDH5.7.4 I am getting following error. Can you help?
Thank you very much,
Garry
build_linux.sh: line 163: [: missing `]'
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2 as hadoop home
Using /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/lib/hbase as hbase home
Copying libs into local build directory
ls: cannot access /ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-*-core.jar: No such file or directory
/ec2_oth/cloudera/parcels/CDH-5.7.4-1.cdh5.7.4.p0.2/hadoop-core-2.6.0-mr1-cdh5.7.4.jar
Cannot find hadoop-streaming jar in hadoop homei
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
  converting help for package ‘rmr2’
    finding HTML links ... done
    bigdataobject                           html
    dfs.empty                               html
    equijoin                                html
    fromdfstodfs                            html
    hadoop-setting                          html
    keyval                                  html
    make.io.format                          html
    mapreduce                               html
    rmr-package                             html
    rmr.options                             html
    rmr.sample                              html
    rmr.str                                 html
    scatter                                 html
    status                                  html
    tomaptoreduce                           html
    vsum                                    html
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)
 
					
				
				
			
		
