Created on 12-26-201609:02 PM - edited 09-16-202201:37 AM
Introduction
h2o is a package for running H2O via its REST API from within R. This package allows the user to run basic H2O commands using R commands. No actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc. on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console.
Scope
I tested this installation guide on CentOS 7.2, but it
should work on similar RedHat/Fedora/Centos…
If this is your first time using CRAN4 it will ask for a
mirror to use. If you want H2O installed site-wide (i.e., usable by all users
on that machine), run R as root, sudo R, then type
install.packages("h2o").
5. Test H2O installation
Type:
library(h2o)
If nothing complains, launch h2o:
h2o.init().
If all went well then you’ll see lots of output about how it
is starting up H2O on your behalf, and then it should tell you all about your
cluster. If not, the error message should be telling you what dependency is
missing, or what the problem is. Post a note to this article and I will get
back to you.
Tips
#1 - The version of H2O on CRAN might be up to a month or two
behind the latest and greatest. Unless you are affected by a bug that you know
has been fixed, don’t worry about it.
#2- h2o.init() will only use two cores on your machine and maybe
a quarter of your system memory, 6 by default. To resize resource, use h2o.shutdown() and start it again:
a) using all your cores:
h2o.init(nthreads = -1)
b) using all your cores and 4 GB:
h2o.init(nthreads = -1, max_mem_size = "4g")
#3 - To run H2O on your local machine, you could call h2o.init without any
arguments, and H2O will be automatically launched at localhost:54321, where the
IP is "127.0.0.1" and the port is 54321.
#4 - If H2O is running on a
cluster, you must provide the IP and port of the remote machine as arguments to
the h2o.init() call. The operation will be done on the server associated with
the data object where H2O is running, not within the R environment.
Tutorials
H2O Tutorial on the Hortonworks Data Platform Sandbox: