Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Introduction

h2o is a package for running H2O via its REST API from within R. This package allows the user to run basic H2O commands using R commands. No actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc. on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console.

Scope

I tested this installation guide on CentOS 7.2, but it should work on similar RedHat/Fedora/Centos…

Steps

1. Install R

sudo yum install R

2. Install Java

https://www.java.com/en/download/help/linux_x64rpm_install.xml

3. Start R and install dependencies

install.packages(RCurl)
install.packages(bitops)
install.packages(rjson)
install.packages(statmod)
install.packages(tools)

4. Install h20 package and load library for use

install.packages("h2o").
library(h2o)

If this is your first time using CRAN4 it will ask for a mirror to use. If you want H2O installed site-wide (i.e., usable by all users on that machine), run R as root, sudo R, then type install.packages("h2o").

5. Test H2O installation

Type:

library(h2o)

If nothing complains, launch h2o:

h2o.init().

If all went well then you’ll see lots of output about how it is starting up H2O on your behalf, and then it should tell you all about your cluster. If not, the error message should be telling you what dependency is missing, or what the problem is. Post a note to this article and I will get back to you.

Tips

#1 - The version of H2O on CRAN might be up to a month or two behind the latest and greatest. Unless you are affected by a bug that you know has been fixed, don’t worry about it.

#2- h2o.init() will only use two cores on your machine and maybe a quarter of your system memory, 6 by default. To resize resource, use h2o.shutdown() and start it again:

a) using all your cores:

h2o.init(nthreads = -1)

b) using all your cores and 4 GB:

h2o.init(nthreads = -1, max_mem_size = "4g")

#3 - To run H2O on your local machine, you could call h2o.init without any arguments, and H2O will be automatically launched at localhost:54321, where the IP is "127.0.0.1" and the port is 54321.

#4 - If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call. The operation will be done on the server associated with the data object where H2O is running, not within the R environment.

Tutorials

H2O Tutorial on the Hortonworks Data Platform Sandbox:

http://hortonworks.com/blog/oxdata-h2o-tutorial-hortonworks-sandbox/

Walk-Though Tutorials for Web UI:

http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/tutorial/top.html

1,199 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎12-26-2016 09:02 PM
Updated by:
 
Contributors
Top Kudoed Authors