Community Articles

Find and share helpful community-sourced technical articles.
avatar
Expert Contributor

In this article, we will install Apache Hawq 2.0.0.0_beta in a cluster composed by:

2 masters (1 active, 1 standby)

3 segments (slaves)

On each node on the cluster install repository to fetch libhdfs3:

[root@hdpmaster01~]# curl -s -L "https://bintray.com/wangzw/rpm/rpm" -o /etc/yum.repos.d/bintray-wangzw-rpm.repo
[root@hdpmaster01~]#

Install epel repository:

[root@hdpmaster01~]# yum -y install epel-release

Install missing dependencies:

[root@hdpamaster01~]# yum -y install man passwd sudo tar which git mlocate links make bzip2 net-tools autoconf automake libtool m4 gcc gcc-c++ gdb bison flex cmake gperf maven indent libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel libesmtp-devel xerces-c-devel python-pip json-c-devel libhdfs3-devel apache-ivy java-1.7.0-openjdk-devel openssh-clients openssh-server

Install postgresql-devel to compile python dependencies

[root@hdpmaster01~]# yum install postgresql-devel

Now, install python dependencies with pip:

pip install pg8000 simplejson unittest2 pycrypto pygresql pyyaml lockfile paramiko psi

You can now remove postgresql-*, be sure to not erase existing psql instances

Download the source code from github:

[root@hdpmaster01~]# cd /root
[root@hdpmaster01~]# git clone https://github.com/apache/incubator-hawq.git
Cloning into 'incubator-hawq'...
remote: Counting
objects: 34883, done.
remote: Total 34883
(delta 0), reused 0 (delta 0), pack-reused 34883
Receiving objects:
100% (34883/34883), 144.95 MiB | 30.04 MiB/s, done.
Resolving deltas:
100% (21155/21155), done.
[root@hdpmaster01~]#

Before compile hawq, you need to compile and install libyarn, c/c++ interface to yarn, that is shipped with the hawq source code

[root@hdpmaster01~]# cd /root/incubator-hawq/depends/libyarn/ && mkdir build/ && cd build
[root@hdpmaster01 build]# pwd /root/incubator-hawq/depends/libyarn/build
[root@hdpmaster01 build]#
[root@hdpmaster01 build]# ../bootstrap
[...]
bootstrap success.
Run "make" to build.
[root@hdpmaster01 build]# make -j && make install
[...]
-- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARN_containermanagement_protocol.pb.h
-- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARNSecurity.pb.h
-- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/libyarncommon/Token.h
[root@hdpmaster01 build]#

Copy the include dir and the lib dir in the correct file system path, and make the library visible to the operating system with ldconfig:

[root@hdpmaster01 build]# cp -R /root/incubator-hawq/depends/libyarn/dist/include/libyarn/ /usr/include/
[root@hdpmaster01 build]# cp /root/incubator-hawq/depends/libyarn/dist/lib/libyarn.so.0.1.13 /usr/lib64/
[root@hdpmaster01 build]# 
[root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.0.1.13 /usr/lib64/libyarn.so.1
[root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.1 /usr/lib64/libyarn.so
[root@hdpmaster01 build]# ldconfig && ldconfig -p | grep libyarn
	libyarn.so.1 (libc6,x86-64) => /lib64/libyarn.so.1
	libyarn.so (libc6,x86-64) => /lib64/libyarn.so
[root@hdpmaster01 build]

Now, we can compile and install apache hawq. I use /opt/ as installation directory:

[root@hdpmaster01 build]# cd /root/incubator-hawq
[root@hdpmaster01 incubator-hawq]# ./configure –prefix=/opt/hawq
[...]
[root@hdpmaster01 incubator-hawq]# make -j8 && make install
[...]
make[2]: Leaving directory `/root/incubator-hawq/tools/gpnetbench'
make[1]: Leaving directory `/root/incubator-hawq/tools'
HAWQ installation complete.

Create the user hawqadmin and change the ownership for hawq installation directory

[root@hdpmaster01 incubator-hawq]# useradd -s /bin/bash hawqadmin
[root@hdpmaster01 incubator-hawq]# passwd hawqadmin
Changing password for user hawqadmin.
New password: 
Retype new password:
passwd: all authentication tokens updated successfully.
[root@hdpmaster01 incubator-hawq]# chown -R hawqadmin.hawqadmin /opt/hawq/
[root@hdpmaster01 incubator-hawq]# 

Repeat the previous steps on all hosts in your cluster.

Now, on the primary master, create the key for user hawqadmin and distribuite the public key to the other hosts (do not set a password for your private key).

As hawqadmin user:

[hawqadmin@hdpmaster01~]$ ssh-keygen 
[...]
[hawqadmin@hdpmaster01~]$ for i in hdpmaster01 hdpmaster02 hdpslave01 hdpslave02 hdpslave03; do
> ssh-copy-id $i
>done
[...]
[hawqadmin@hdpmaster01~]$

Repeat the previous loop on the standby master.

On the primary master host, edit /opt/hawq/etc/hdfs-client.xml and /opt/hawq/etc/yarn-client.xml as they fit your needs (eg. For namenode and resourcemanager high availability or for kerberos authentication), then edit the following properties in hawq-site.xml

<property>
        <name>hawq_master_address_host</name>
        <value>hdpmaster01</value>
        <description>The host name of hawq master.</description>
    </property>

    <property>
        <name>hawq_master_address_port</name>
        <value>5432</value>
        <description>The port of hawq master.</description>
    </property>

    <property>
        <name>hawq_standby_address_host</name>
        <value>hdpmaster02</value>
        <description>The host name of hawq standby master.</description>
    </property>

    <property>
        <name>hawq_segment_address_port</name>
        <value>40000</value>
        <description>The port of hawq segment.</description>
    </property>

    <property>
        <name>hawq_dfs_url</name>
        <value>hdfsha/hawq_default</value>
        <description>URL for accessing HDFS.</description>
    </property>

    <property>
       <name>hawq_master_directory</name>
        <value>/data01/hawq/masterdd</value>
        <description>The directory of hawq master.</description>
    </property>

    <property>
        <name>hawq_segment_directory</name>
        <value>/data01/hawq/segmentdd</value>
        <description>The directory of hawq segment.</description>
    </property>

    <property>
        <name>hawq_global_rm_type</name>
        <value>yarn</value>
    </property>

    <property>
        <name>hawq_rm_yarn_address</name>
        <value>hdpmaster02:8032</value>
    </property>
   <property>
        <name>hawq_rm_yarn_scheduler_address</name>
        <value>hdpmaster02:8030</value>
    </property>

    <property>
        <name>hawq_rm_yarn_queue_name</name>
        <value>default</value>
        <description>The YARN queue name to register hawq resource manager.</description>
    </property>

    <property>
        <name>hawq_rm_yarn_app_name</name>
        <value>hawq</value>
        <description>The application name to register hawq resource manager in YARN.</description>
    </property>

You can leave the others options unchanged.

NOTE: if you have a postgresql instance running on the master nodes, you must change the property hawq_master_address_port

Write the slaves FQDN in the /opt/hawq/etc/slaves file e.g.

[hawqadmin@hdpmaster01 etc]$ echo -e "hdpslave01\nhdpslave02\nhdpslave03" > slaves 
[hawqadmin@hdpmaster01 etc]$ cat slaves 
hdpslave01
hdpslave02
hdpslave03

Copy the configuration files on all other hosts, in the /opt/hawq/etc/ directory

Now, as hdfs user, create the hawqadmin home and the data dir on hdfs

[hdfs@hdpmaster01~]$ hdfs dfs -mkdir /user/hawqadmin && hdfs dfs -chown hawqadmin /user/hawqadmin
[hdfs@hdpmaster01~]$ hdfs dfs -mkdir /hawq_default && hdfs dfs -chown hawqadmin /hawq_default
[hdfs@hdpmaster01~]$

On both masters, create the master data dir:

mkdir -p /data01/hawq/masterdd && chown -R hawqadmin /data01/hawq

Create the segments data dir on all slaves

mkdir -p /data01/hawq/segmentdd && chown -R hawqadmin /data01/hawq

Initialize the cluster as hawqadmin user. Remember to source the environment file before execute any action (/opt/hawq/greenplum_path.sh)

[hawqadmin@hdpmaster01 hawq]$ cd /opt/hawq/
[hawqadmin@hdpmaster01 hawq]$ source greenplum_path.sh 
[hawqadmin@hdpmaster01 hawq]$ hawq init cluster -av
[...]
20160229:15:42:40:158114 hawq_init:hdpmaster01:hawqadmin-[INFO]:-Init HAWQ cluster successfully
[hawqadmin@hdpmaster01 hawq]$ 

The init statement also starts the cluster, so you can now check the cluster state with the following command

hawq state cluster

You can also see the running application on YARN:

[hawqadmin@hdpmaster01 ~]$ yarn application -list | awk '/application_/ {printf ("%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$4,$5)}'

application_1456240841318_0026    hawq    YARN    hawqadmin    default

Now, connect to the database and create a sample table:

[hawqadmin@hdpmaster01 hawq]$ psql -d postgres
psql (8.2.15)
Type "help" for help.
postgres=# \d
No relations found.
postgres=# create table test (field1 int, field2 varchar(30));
CREATE TABLE
postgres=# \d+ test
                  Append-Only Table "public.test"
 Column |         Type          | Modifiers | Storage  | Description 
--------+-----------------------+-----------+----------+-------------
 field1 | integer               |           | plain    | 
 field2 | character varying(30) |           | extended | 
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Has OIDs: no
Options: appendonly=true
Distributed randomly
postgres=# insert into test (field1, field2) values (1, 'May the hawq be with you');
INSERT 0 1
postgres=# select * from test;
 field1 |          field2           
--------+---------------------------
      1 | May the hawq be with you
(1 row)
postgres=# 


That's all! 🙂

6,385 Views
Comments
avatar
New Contributor

Hi Davide,

Thanks for the post. I'm trying to stand this up on a small cluster on AWS - 1 master/4 workers. I've made it through the build and install but am getting the following error during the "hawq init cluster -av"

From the /home/hawqadmin/hawqAdminLogs/hawq_init_20160309.log

.........
20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[DEBUG]:-Check hdfs: /opt/hawq/bin/gpcheckhdfs hdfs ip-172-31-31-145.ec2.internal:8020/hawq_default off
20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-4 segment hosts defined
20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Set default_segment_num as: 32
20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Start to init master node: 'ip-172-31-31-145.ec2.internal'
The files belonging to this database system will be owned by user "hawqadmin".
This user must also own the server process.


The database cluster will be initialized with locale en_US.utf8.


fixing permissions on existing directory /data01/hawq/masterdd ... ok
creating subdirectories ... ok
selecting default max_connections ... initdb: error 256 from: "/opt/hawq/bin/postgres" --boot -x0 -F -c max_connections=1280 -c shared_buffers=4000 -c max_fsm_pages=200000 < "/dev/null" > "/home/hawqadmin/hawqAdminLogs/master.initdb" 2>&1
initdb: removing contents of data directory "/data01/hawq/masterdd"
Master postgres initdb failed
20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Master postgres initdb failed
20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[ERROR]:-Master init failed, exit

This is running on Centos7 and I found the following JIRA in BigTop: https://issues.apache.org/jira/browse/BIGTOP-2325 Any ideas on a work around?

Thanks!

Ryan

avatar
Expert Contributor

Hi Ryan,

could you check if you have the right permissions on the local directory?

[hawqadmin@hdpmaster01 ~]$ ls -ld /data01/hawq/masterdd/
drwx------ 16 hawqadmin hadoop 4096 Mar  1 09:19 /data01/hawq/masterdd/
[hawqadmin@hdpmaster01 ~]$ ls -l /data01/hawq/masterdd/
total 40
drwx------ 5 hawqadmin hawqadmin    38 Feb 29 15:38 base
drwx------ 2 hawqadmin hawqadmin  4096 Mar  1 09:19 global
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_changetracking
drwx------ 2 hawqadmin hawqadmin    17 Feb 29 15:38 pg_clog
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_distributedlog
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_distributedxidmap
-rw-rw-r-- 1 hawqadmin hawqadmin  4021 Feb 29 15:38 pg_hba.conf
-rw------- 1 hawqadmin hawqadmin  1636 Feb 29 15:38 pg_ident.conf
drwx------ 2 hawqadmin hawqadmin   156 Mar  1 00:00 pg_log
drwx------ 4 hawqadmin hawqadmin    34 Feb 29 15:38 pg_multixact
drwx------ 2 hawqadmin hawqadmin     6 Mar  1 09:19 pg_stat_tmp
drwx------ 2 hawqadmin hawqadmin    17 Feb 29 15:38 pg_subtrans
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_tblspc
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_twophase
drwx------ 2 hawqadmin hawqadmin     6 Feb 29 15:38 pg_utilitymodedtmredo
-rw------- 1 hawqadmin hawqadmin     4 Feb 29 15:38 PG_VERSION
drwx------ 3 hawqadmin hawqadmin    58 Feb 29 15:38 pg_xlog
-rw------- 1 hawqadmin hawqadmin 18393 Feb 29 15:38 postgresql.conf
-rw------- 1 hawqadmin hawqadmin   104 Feb 29 15:40 postmaster.opts
[hawqadmin@hdpmaster01 ~]$ 

Also, what are the permissions on the directory on hdfs?

[hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -d /hawq_default
drwxr-xr-x   - hawqadmin hdfs          0 2016-02-29 15:38 /hawq_default
[hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -R /hawq_default
drwx------   - hawqadmin hdfs          0 2016-02-29 15:47 /hawq_default/16385
drwx------   - hawqadmin hdfs          0 2016-03-01 08:54 /hawq_default/16385/16387
drwx------   - hawqadmin hdfs          0 2016-03-01 08:55 /hawq_default/16385/16387/16513
-rw-------   3 hawqadmin hdfs         48 2016-03-01 08:55 /hawq_default/16385/16387/16513/1
-rw-------   3 hawqadmin hdfs          4 2016-02-29 15:47 /hawq_default/16385/16387/PG_VERSION
[hawqadmin@hdpmaster01 ~]$
avatar
New Contributor

Hi!

Thanks for the article, the one I was using (the CentOS 7 version) mostly omitted the ldconfig part I got stuck on. Although when installing hawq I got some errors: http://pastebin.com/E3heLzS3

Do you have any advice how I could fix this? Thanks!

Tamás

avatar
Expert Contributor

Hi Tamás, I see you are using openjdk 1.7. Try using openjdk 1.8 instead.

Davide

avatar
New Contributor

Thanks for the respone, although yesterday I just ended up redoing the whole process based on your article and it installed flawlessly, and everything is fine!

Best regards, Tamás