Community Articles

Find and share helpful community-sourced technical articles.
avatar
Guru
  • Goals
  • Setup HDB 2.0 on HDP 2.4.0.0 Sandbox
  • Access HDB 2.0 via pgAdmin3 for interactive access

Notes

  • This effort is to get up and running with HDB 2.0 on Hortonworks 2.4.0.0 Tryout Sandbox. This steps mentioned here are not intended for production usage and should be merely used as reference.
  • HDB, a.k.a HAWQ will eventually be integrated as a service similar to other addons like Hive, Hbase etcetera
  • This setup was completed using HDP 2.4.0.0 sandbox which can be downloaded here
  • Article assumes that Sandbox is up and running on VMWare Fusion or Virtual box
  • Reference for this article https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install
  • For the purpose of convenience, we will use HAWQ as the term for HDB 2.0

Installing HAWQ on HDP 2.4.0.0

HAWQ 2.0 is one of the latest release from Pivotal and can be configured in other version of HDP 2.x as reflected in the articles here:

https://community.hortonworks.com/content/kbentry/20420/install-apache-hawq-on-hdp-234.html https://community.hortonworks.com/content/kbentry/34193/install-hdb-hawq-via-ambari-and-use-zeppelin...

  • Login to Sandbox via Terminal if you are using an OSX or via Putty and ensure that you have superuser privileges. Once set, create a directory for dependencies and other binaries that we would use throughout this article
[root@sandbox ~]# mkdir -p /stage
  • Upgrade the sandbox to avoid any dependency issues
[root@sandbox stage]# yum update
  • Login to the Pivotal's binary download portal via network.pivotal.io & download the following binaries & copy them over to the Hortonworks Sandbox virtual machine
    • hdb-ambari-plugin-2.0.0-448.tar.gz
    • hdb-2.0.0.0-22126.tar.gz
  • Uncompress and untar the the directories and setup repository
[root@sandbox stage]# ls -lrth
total 146M
-rw-r--r-- 1 root root  25K Jun 15 17:30 hdb-ambari-plugin-2.0.0-448.tar.gz
-rw-r--r-- 1 root root 146M Jun 15 17:30 hdb-2.0.0.0-22126.tar.gz
[root@sandbox stage]# tar -xzf hdb-2.0.0.0-22126.tar.gz 
[root@sandbox stage]# tar -xzf hdb-ambari-plugin-2.0.0-448.tar.gz 
[root@sandbox stage]# bash hdb-2.0.0.0/setup_repo.sh 
HDB Repo file successfully created at /etc/yum.repos.d/HDB.repo.
Use http://sandbox.hortonworks.com/HDB to access the repository.
[root@sandbox stage]# 
[root@sandbox stage]# bash hdb-ambari-plugin-2.0.0/setup_repo.sh 
HDB-AMBARI-PLUGIN Repo file successfully created at /etc/yum.repos.d/HDB-AMBARI-PLUGIN.repo.
Use http://sandbox.hortonworks.com/HDB-AMBARI-PLUGIN to access the repository.
  • Verify if the setup is configured for HAWQ as well as Ambari Plugin
Matched from:[root@sandbox stage]# yum provides hdb\*
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirrors.lga7.us.voxel.net
 * epel: mirror.steadfast.net
 * extras: pubmirrors.dal.corespace.com
 * updates: mirrors.cmich.edu
hdb-ambari-plugin-2.0.0-448.noarch : hdb-ambari-plugin
Repo        : HDB-AMBARI-PLUGIN
Matched from:
Other       : hdb-ambari-plugin = 2.0.0-448


[root@sandbox stage]# yum provides hawq
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirrors.lga7.us.voxel.net
 * epel: mirror.steadfast.net
 * extras: pubmirrors.dal.corespace.com
 * updates: mirrors.cmich.edu
hawq-2.0.0.0-22126.x86_64 : Pivotal HDB, Hadoop Native SQL powered by Apache HAWQ (incubating)
Repo        : HDB
Matched from:
  • Install hdb-ambari-plugin for HAWQ
[root@sandbox stage]# yum install hdb-ambari-plugin
  • Login to the Ambari web portal and verify that HAWQ is available as a service which can be added just like any other service

5039-screen-shot-2016-06-15-at-105055-am.png

  • Add this custom property to hdfs-site.xml via Ambari and the value should be set to true
dfs.allow.truncate
  • Restart HDFS service via Ambari
  • Proceed with adding HAWQ via Ambari as a new service
  • During the "Customize Services" phase, enter port number 10432 or anything beyond linux internal ports as 5432 is reserved by Ambari for storing its metadata, in postgres database.
  • Proceed with configuration and deploy the setup should complete, however, with warnings
NOTE: HAWQ tries to initialize the cluster with default/hardcoded parallel connections and shared buffers which are 3000 and 4000 by default. 
  • Manually initialize HAWQ from command line by reducing the shared_buffers and max_connections as gpadmin user
[root@sandbox stage]# su - gpadmin
[gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500
  • This should bring up the cluster which can be tested and tried out.
[gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init'
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20160615.log
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-GPHOME is set to:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/usr/local/hawq/.
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init hawq with args: ['init', 'cluster']


Continue with HAWQ init Yy|Nn (default=N):
> y
20160615:18:32:25:046085 hawq_init:sandbox:gpadmin-[INFO]:-No standby host configured, skip it
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Check if hdfs path is available
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[WARNING]:-2016-06-15 18:32:26.024369, p46198, th140320952715232, WARNING the number of nodes in pipeline is 1 [sandbox.hortonworks.com(172.16.105.137)], is less than the expected number of replica 3 for block [block pool ID: BP-267552868-172.16.137.143-1457691099567 block ID 1073742404_1585] file /hawq_default/testFile
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 segment hosts defined
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Set default_hash_table_bucket_number as: 6
20160615:18:32:31:046085 hawq_init:sandbox:gpadmin-[INFO]:-Start to init master node: 'sandbox.hortonworks.com'
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-20160615:18:32:39:046409 hawqinit.sh:sandbox:gpadmin-[INFO]:-Loading hawq_toolkit...
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Master init successfully
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init segments in list: ['sandbox.hortonworks.com']
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Total segment number is: 1
.........
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 of 1 segments init successfully
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Segments init successfully on nodes '['sandbox.hortonworks.com']'
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init HAWQ cluster successfully
  • Verifying HAWQ database access by creating a database and a table within
[gpadmin@sandbox ~]$ psql template1
psql (8.2.15)
Type "help" for help.


template1=# create database mydb;
CREATE DATABASE
template1=# \c mydb
You are now connected to database "mydb" as user "gpadmin".
mydb=# 
mydb=# 
mydb=# CREATE TABLE mytable (col1 int, col2 int, col3 int);
CREATE TABLE
mydb=# INSERT INTO mytable select i,i,i from generate_series(0,1000)i;
INSERT 0 1001
mydb=# SELECT count(*) from mytable;
 count 
-------
  1001
(1 row)

mydb=# 
  • At this point it should be manageable by Ambari
3,200 Views
0 Kudos