Community Articles

Find and share helpful community-sourced technical articles.
avatar
Guru

Goal

The goal of of this article is to get familiar with the most convenient method of accessing HAWQ Database i.e. using psql native client

Assumptions

  • HAWQ Database is available and active

PSQL Native Client (Available for download @ https://network.pivotal.io)

PSQL or "psql" (the binary command) is the ideal choice to connect to HAWQ database/warehouse because HAWQ itself is based on Postgres database. Steps to configure psql on Client/Edge node:

  • Make a note of the Public IP address for the Client/Edge Node, this will be needed later.
  • Access to HAWQ is available by default from HAWQ master host, make a note of HAWQ database name and user you would want to use for enabling database access.
    • To identify name of database (psql -l) from the gpadmin user prompt in linux
    • To identify available user names "psql -c \du"
  • Download the HAWQ installation binaries from the following site https://network.pivotal.io/products/pivotal-hdb#/releases/3466/file_groups/520
  • Follow this guide to setup the HAWQ repository

http://hdb.docs.pivotal.io/211/hdb/install/setup-hdb-repos.html

  • Install psql binary, this should be available from the repo we just set up
[root@sandbox ~]# yum install hawq
  • Now the client is installed, next ensure that HAWQ listens to this edge node. To do this, you need to identify the data directory for HAWQ master, here is how we can do that.
  • Login to HAWQ master host via SSH as gpadmin user & execute the following command
[root@sandbox ~]# ps aux | egrep silent
gpadmin  209268  0.1  3.5 591072 357720 ?       Ss   03:02   0:00 /usr/local/hawq_2_1_0_0/bin/postgres -D /data/hawq/master -i -M master -p 5433 --silent-mode=true
gpadmin  209428  0.2  4.0 693584 407636 ?       Ss   03:02   0:01 /usr/local/hawq_2_1_0_0/bin/postgres -D /data/hawq/segment -i -M segment -p 40000 --silent-mode=true
root     212230  0.0  0.0   8456   924 pts/1    S+   03:12   0:00 egrep silent

NOTE: You might see an output like above, this is because the demo setup was on a sandbox, so both master and segment nodes are on the same sandbox. We need to look at the path which says "master", in this case "/data/hawq/master.

  • Change directory to master data folder and make a copy of the file called pg_hba.conf
[root@sandbox ~]# cd /data/hawq/master
[root@sandbox master]# cp pg_hba.conf pg_hba.conf.orig
  • File "pg_hba.conf" will already have entries like the following
local    all         gpadmin         ident
host     all         gpadmin         127.0.0.1/28    trust
host  all     gpadmin    172.16.105.183/32       trust
  1. Here "host" or "local" specifies if the client entry point is local (on same host) or "host" from a different node (either within the same cluster or on the LAN)
  2. "all" basically allows access to all the databases, this can be specific, for instance, "hdw" provided that is the name of your database
  3. "gpadmin" is the default superuser for HAWQ cluster, this user has access to the cluster (within specified IP ranges)
  4. IP address is either the 32-bit absolute IP address or could be opened to a subnet
  5. "trust" is the method

Now to understand the last entry in the above example, we are providing access for from any "host", provided "all"/any database only when the user is "gpadmin" trying to connect from client which has the IP address "172.16.105.183" and method is "trust", so there is no password required. To understand pg_hba.conf better, here is the link: https://www.postgresql.org/docs/8.3/static/auth-pg-hba-conf.html

  • Now assuming that your edge node has an IP address of "172.26.78.183", database name is "test", username is "myuser", and you want to use securely use minimally secure login method "md5" then the entry would look something like this.
host  test     myuser    172.26.78.183/32       md5
  • Once the entry is made, we need to ensure that HAWQ is infact open to listening to remote hosts. A "*" signifies that HAWQ can listen to all remote hosts.
[gpadmin@sandbox ~]$ egrep -i --color listen_address /data/hawq/master/postgresql.conf 
#listen_addresses = '*'		# what IP address(es) to listen on;
  • Now ensure that our entry in pg_hba.conf file is cached by HAWQ process, hence, HAWQ will be able to restrict the creation new processes. (We need to be gpadmin user)
[gpadmin@sandbox ~]$ /usr/local/hawq/bin/pg_ctl reload -D /data/hawq/master
server signaled
[gpadmin@sandbox ~]$
  • You should be able to connect using the psql now
[gpadmin@sandbox ~]$ psql -U myuser -h 172.16.105.183 -d test 
Password for user myuser: 
psql (8.2.15)
Type "help" for help.

test=> 
2,088 Views
Comments
avatar
Contributor

Some corrections.

1. You shouldn't use the PostgreSQL version of psql with HAWQ. While it may work, you should use the one distributed with HAWQ. You'll also want other utilities like gpfdist which are distributed with HAWQ and not part of PostgreSQL. You'll instead want to use rpm to install the utilities on an edge node.

rpm -i hawq-2.1.1.0-7.el6.x86_64.rpm

2. Do not use pg_ctl with HAWQ. You should either use Ambari to restart the HAWQ service or use the "hawq" command in a terminal window. pg_ctl will probably stop being distributed with HAWQ in the future.

hawq stop cluster -u -a

-u means to update the config

-a means to do it silently

3. An easier way to allow external connections is to add this to the end of the pg_hba.conf file:

host all all 0.0.0.0/0 md5

This means that all external connections will require an encrypted password to authenticate. This is the database password too and not the operating system password.

psql -c "alter user gpadmin password 'secret'"

That will change the gpadmin password in the database to secret.