Created on 12-26-2016 09:02 PM
h2o is a package for running H2O via its REST API from within R. This package allows the user to run basic H2O commands using R commands. No actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc. on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console.
I tested this installation guide on CentOS 7.2, but it should work on similar RedHat/Fedora/Centos…
1. Install R
sudo yum install R
2. Install Java
3. Start R and install dependencies
install.packages(RCurl) install.packages(bitops) install.packages(rjson) install.packages(statmod) install.packages(tools)
4. Install h20 package and load library for use
If this is your first time using CRAN4 it will ask for a mirror to use. If you want H2O installed site-wide (i.e., usable by all users on that machine), run R as root, sudo R, then type install.packages("h2o").
5. Test H2O installation
If nothing complains, launch h2o:
If all went well then you’ll see lots of output about how it is starting up H2O on your behalf, and then it should tell you all about your cluster. If not, the error message should be telling you what dependency is missing, or what the problem is. Post a note to this article and I will get back to you.
#1 - The version of H2O on CRAN might be up to a month or two behind the latest and greatest. Unless you are affected by a bug that you know has been fixed, don’t worry about it.
#2- h2o.init() will only use two cores on your machine and maybe a quarter of your system memory, 6 by default. To resize resource, use h2o.shutdown() and start it again:
a) using all your cores:
h2o.init(nthreads = -1)
b) using all your cores and 4 GB:
h2o.init(nthreads = -1, max_mem_size = "4g")
#3 - To run H2O on your local machine, you could call h2o.init without any arguments, and H2O will be automatically launched at localhost:54321, where the IP is "127.0.0.1" and the port is 54321.
#4 - If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call. The operation will be done on the server associated with the data object where H2O is running, not within the R environment.
H2O Tutorial on the Hortonworks Data Platform Sandbox:
Walk-Though Tutorials for Web UI: