Community Articles

srai1 · ‎06-15-2016

Goals
Setup HDB 2.0 on HDP 2.4.0.0 Sandbox
Access HDB 2.0 via pgAdmin3 for interactive access

Notes

This effort is to get up and running with HDB 2.0 on Hortonworks 2.4.0.0 Tryout Sandbox. This steps mentioned here are not intended for production usage and should be merely used as reference.
HDB, a.k.a HAWQ will eventually be integrated as a service similar to other addons like Hive, Hbase etcetera
This setup was completed using HDP 2.4.0.0 sandbox which can be downloaded here
Article assumes that Sandbox is up and running on VMWare Fusion or Virtual box
Reference for this article https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install
For the purpose of convenience, we will use HAWQ as the term for HDB 2.0

Installing HAWQ on HDP 2.4.0.0

HAWQ 2.0 is one of the latest release from Pivotal and can be configured in other version of HDP 2.x as reflected in the articles here:

https://community.hortonworks.com/content/kbentry/20420/install-apache-hawq-on-hdp-234.html https://community.hortonworks.com/content/kbentry/34193/install-hdb-hawq-via-ambari-and-use-zeppelin...

Login to Sandbox via Terminal if you are using an OSX or via Putty and ensure that you have superuser privileges. Once set, create a directory for dependencies and other binaries that we would use throughout this article

[root@sandbox ~]# mkdir -p /stage

Upgrade the sandbox to avoid any dependency issues

[root@sandbox stage]# yum update

Login to the Pivotal's binary download portal via network.pivotal.io & download the following binaries & copy them over to the Hortonworks Sandbox virtual machine
- hdb-ambari-plugin-2.0.0-448.tar.gz
- hdb-2.0.0.0-22126.tar.gz
Uncompress and untar the the directories and setup repository

[root@sandbox stage]# ls -lrth
total 146M
-rw-r--r-- 1 root root  25K Jun 15 17:30 hdb-ambari-plugin-2.0.0-448.tar.gz
-rw-r--r-- 1 root root 146M Jun 15 17:30 hdb-2.0.0.0-22126.tar.gz
[root@sandbox stage]# tar -xzf hdb-2.0.0.0-22126.tar.gz 
[root@sandbox stage]# tar -xzf hdb-ambari-plugin-2.0.0-448.tar.gz 
[root@sandbox stage]# bash hdb-2.0.0.0/setup_repo.sh 
HDB Repo file successfully created at /etc/yum.repos.d/HDB.repo.
Use http://sandbox.hortonworks.com/HDB to access the repository.
[root@sandbox stage]# 
[root@sandbox stage]# bash hdb-ambari-plugin-2.0.0/setup_repo.sh 
HDB-AMBARI-PLUGIN Repo file successfully created at /etc/yum.repos.d/HDB-AMBARI-PLUGIN.repo.
Use http://sandbox.hortonworks.com/HDB-AMBARI-PLUGIN to access the repository.

Verify if the setup is configured for HAWQ as well as Ambari Plugin

Matched from:[root@sandbox stage]# yum provides hdb\*
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirrors.lga7.us.voxel.net
 * epel: mirror.steadfast.net
 * extras: pubmirrors.dal.corespace.com
 * updates: mirrors.cmich.edu
hdb-ambari-plugin-2.0.0-448.noarch : hdb-ambari-plugin
Repo        : HDB-AMBARI-PLUGIN
Matched from:
Other       : hdb-ambari-plugin = 2.0.0-448


[root@sandbox stage]# yum provides hawq
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirrors.lga7.us.voxel.net
 * epel: mirror.steadfast.net
 * extras: pubmirrors.dal.corespace.com
 * updates: mirrors.cmich.edu
hawq-2.0.0.0-22126.x86_64 : Pivotal HDB, Hadoop Native SQL powered by Apache HAWQ (incubating)
Repo        : HDB
Matched from:

Install hdb-ambari-plugin for HAWQ

[root@sandbox stage]# yum install hdb-ambari-plugin

Login to the Ambari web portal and verify that HAWQ is available as a service which can be added just like any other service

Add this custom property to hdfs-site.xml via Ambari and the value should be set to true

dfs.allow.truncate

Restart HDFS service via Ambari
Proceed with adding HAWQ via Ambari as a new service
During the "Customize Services" phase, enter port number 10432 or anything beyond linux internal ports as 5432 is reserved by Ambari for storing its metadata, in postgres database.
Proceed with configuration and deploy the setup should complete, however, with warnings

NOTE: HAWQ tries to initialize the cluster with default/hardcoded parallel connections and shared buffers which are 3000 and 4000 by default.

Manually initialize HAWQ from command line by reducing the shared_buffers and max_connections as gpadmin user

[root@sandbox stage]# su - gpadmin
[gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500

This should bring up the cluster which can be tested and tried out.

[gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init'
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20160615.log
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-GPHOME is set to:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/usr/local/hawq/.
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init hawq with args: ['init', 'cluster']


Continue with HAWQ init Yy|Nn (default=N):
> y
20160615:18:32:25:046085 hawq_init:sandbox:gpadmin-[INFO]:-No standby host configured, skip it
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Check if hdfs path is available
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[WARNING]:-2016-06-15 18:32:26.024369, p46198, th140320952715232, WARNING the number of nodes in pipeline is 1 [sandbox.hortonworks.com(172.16.105.137)], is less than the expected number of replica 3 for block [block pool ID: BP-267552868-172.16.137.143-1457691099567 block ID 1073742404_1585] file /hawq_default/testFile
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 segment hosts defined
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Set default_hash_table_bucket_number as: 6
20160615:18:32:31:046085 hawq_init:sandbox:gpadmin-[INFO]:-Start to init master node: 'sandbox.hortonworks.com'
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-20160615:18:32:39:046409 hawqinit.sh:sandbox:gpadmin-[INFO]:-Loading hawq_toolkit...
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Master init successfully
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init segments in list: ['sandbox.hortonworks.com']
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Total segment number is: 1
.........
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 of 1 segments init successfully
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Segments init successfully on nodes '['sandbox.hortonworks.com']'
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init HAWQ cluster successfully

Verifying HAWQ database access by creating a database and a table within

[gpadmin@sandbox ~]$ psql template1
psql (8.2.15)
Type "help" for help.


template1=# create database mydb;
CREATE DATABASE
template1=# \c mydb
You are now connected to database "mydb" as user "gpadmin".
mydb=# 
mydb=# 
mydb=# CREATE TABLE mytable (col1 int, col2 int, col3 int);
CREATE TABLE
mydb=# INSERT INTO mytable select i,i,i from generate_series(0,1000)i;
INSERT 0 1001
mydb=# SELECT count(*) from mytable;
 count 
-------
  1001
(1 row)

mydb=#

At this point it should be manageable by Ambari

Cloudera Community

Community Articles

Installing HAWQ on 2.4.0.0 Hortonworks Sandbox

Apache Ambari

Apache Hadoop

Apache HBase

Apache Hive

Apache Zeppelin

HDFS

Hortonworks Data Platform (HDP)

Install Apache Hawq on HDP 2.3.4

Cheat Sheet on Hortonworks HDB powered by Apache H...

Install HDB (HAWQ) via Ambari and use Zeppelin for...

Setting up RStudio on Hortonworks Docker Sandbox 2...

Installing RStudio on HDP Sandbox

HAWQ- Accessing HAWQ via Client/Edge Node

Getting started with Hortonworks Sandbox on Azure

RStudio installation on Hortonworks Sandbox

How to install and run Spark 2.0 on HDP 2.5 Sandbo...

Cheat Sheet and Tips for a Custom Install of Horto...