Created on 03-01-2016 11:57 AM
In this article, we will install Apache Hawq 2.0.0.0_beta in a cluster composed by:
2 masters (1 active, 1 standby)
3 segments (slaves)
On each node on the cluster install repository to fetch libhdfs3:
[root@hdpmaster01~]# curl -s -L "https://bintray.com/wangzw/rpm/rpm" -o /etc/yum.repos.d/bintray-wangzw-rpm.repo [root@hdpmaster01~]#
Install epel repository:
[root@hdpmaster01~]# yum -y install epel-release
Install missing dependencies:
[root@hdpamaster01~]# yum -y install man passwd sudo tar which git mlocate links make bzip2 net-tools autoconf automake libtool m4 gcc gcc-c++ gdb bison flex cmake gperf maven indent libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel libesmtp-devel xerces-c-devel python-pip json-c-devel libhdfs3-devel apache-ivy java-1.7.0-openjdk-devel openssh-clients openssh-server
Install postgresql-devel to compile python dependencies
[root@hdpmaster01~]# yum install postgresql-devel
Now, install python dependencies with pip:
pip install pg8000 simplejson unittest2 pycrypto pygresql pyyaml lockfile paramiko psi
You can now remove postgresql-*, be sure to not erase existing psql instances
Download the source code from github:
[root@hdpmaster01~]# cd /root [root@hdpmaster01~]# git clone https://github.com/apache/incubator-hawq.git Cloning into 'incubator-hawq'... remote: Counting objects: 34883, done. remote: Total 34883 (delta 0), reused 0 (delta 0), pack-reused 34883 Receiving objects: 100% (34883/34883), 144.95 MiB | 30.04 MiB/s, done. Resolving deltas: 100% (21155/21155), done. [root@hdpmaster01~]#
Before compile hawq, you need to compile and install libyarn, c/c++ interface to yarn, that is shipped with the hawq source code
[root@hdpmaster01~]# cd /root/incubator-hawq/depends/libyarn/ && mkdir build/ && cd build [root@hdpmaster01 build]# pwd /root/incubator-hawq/depends/libyarn/build [root@hdpmaster01 build]# [root@hdpmaster01 build]# ../bootstrap [...] bootstrap success. Run "make" to build. [root@hdpmaster01 build]# make -j && make install [...] -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARN_containermanagement_protocol.pb.h -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/records/YARNSecurity.pb.h -- Installing: /root/incubator-hawq/depends/libyarn/dist/include/libyarn/libyarncommon/Token.h [root@hdpmaster01 build]#
Copy the include dir and the lib dir in the correct file system path, and make the library visible to the operating system with ldconfig:
[root@hdpmaster01 build]# cp -R /root/incubator-hawq/depends/libyarn/dist/include/libyarn/ /usr/include/ [root@hdpmaster01 build]# cp /root/incubator-hawq/depends/libyarn/dist/lib/libyarn.so.0.1.13 /usr/lib64/ [root@hdpmaster01 build]# [root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.0.1.13 /usr/lib64/libyarn.so.1 [root@hdpmaster01 build]# ln -s /usr/lib64/libyarn.so.1 /usr/lib64/libyarn.so [root@hdpmaster01 build]# ldconfig && ldconfig -p | grep libyarn libyarn.so.1 (libc6,x86-64) => /lib64/libyarn.so.1 libyarn.so (libc6,x86-64) => /lib64/libyarn.so [root@hdpmaster01 build]
Now, we can compile and install apache hawq. I use /opt/ as installation directory:
[root@hdpmaster01 build]# cd /root/incubator-hawq [root@hdpmaster01 incubator-hawq]# ./configure –prefix=/opt/hawq [...] [root@hdpmaster01 incubator-hawq]# make -j8 && make install [...] make[2]: Leaving directory `/root/incubator-hawq/tools/gpnetbench' make[1]: Leaving directory `/root/incubator-hawq/tools' HAWQ installation complete.
Create the user hawqadmin and change the ownership for hawq installation directory
[root@hdpmaster01 incubator-hawq]# useradd -s /bin/bash hawqadmin [root@hdpmaster01 incubator-hawq]# passwd hawqadmin Changing password for user hawqadmin. New password: Retype new password: passwd: all authentication tokens updated successfully. [root@hdpmaster01 incubator-hawq]# chown -R hawqadmin.hawqadmin /opt/hawq/ [root@hdpmaster01 incubator-hawq]#
Repeat the previous steps on all hosts in your cluster.
Now, on the primary master, create the key for user hawqadmin and distribuite the public key to the other hosts (do not set a password for your private key).
As hawqadmin user:
[hawqadmin@hdpmaster01~]$ ssh-keygen [...] [hawqadmin@hdpmaster01~]$ for i in hdpmaster01 hdpmaster02 hdpslave01 hdpslave02 hdpslave03; do > ssh-copy-id $i >done [...] [hawqadmin@hdpmaster01~]$
Repeat the previous loop on the standby master.
On the primary master host, edit /opt/hawq/etc/hdfs-client.xml and /opt/hawq/etc/yarn-client.xml as they fit your needs (eg. For namenode and resourcemanager high availability or for kerberos authentication), then edit the following properties in hawq-site.xml
<property> <name>hawq_master_address_host</name> <value>hdpmaster01</value> <description>The host name of hawq master.</description> </property> <property> <name>hawq_master_address_port</name> <value>5432</value> <description>The port of hawq master.</description> </property> <property> <name>hawq_standby_address_host</name> <value>hdpmaster02</value> <description>The host name of hawq standby master.</description> </property> <property> <name>hawq_segment_address_port</name> <value>40000</value> <description>The port of hawq segment.</description> </property> <property> <name>hawq_dfs_url</name> <value>hdfsha/hawq_default</value> <description>URL for accessing HDFS.</description> </property> <property> <name>hawq_master_directory</name> <value>/data01/hawq/masterdd</value> <description>The directory of hawq master.</description> </property> <property> <name>hawq_segment_directory</name> <value>/data01/hawq/segmentdd</value> <description>The directory of hawq segment.</description> </property> <property> <name>hawq_global_rm_type</name> <value>yarn</value> </property> <property> <name>hawq_rm_yarn_address</name> <value>hdpmaster02:8032</value> </property> <property> <name>hawq_rm_yarn_scheduler_address</name> <value>hdpmaster02:8030</value> </property> <property> <name>hawq_rm_yarn_queue_name</name> <value>default</value> <description>The YARN queue name to register hawq resource manager.</description> </property> <property> <name>hawq_rm_yarn_app_name</name> <value>hawq</value> <description>The application name to register hawq resource manager in YARN.</description> </property>
You can leave the others options unchanged.
NOTE: if you have a postgresql instance running on the master nodes, you must change the property hawq_master_address_port
Write the slaves FQDN in the /opt/hawq/etc/slaves file e.g.
[hawqadmin@hdpmaster01 etc]$ echo -e "hdpslave01\nhdpslave02\nhdpslave03" > slaves [hawqadmin@hdpmaster01 etc]$ cat slaves hdpslave01 hdpslave02 hdpslave03
Copy the configuration files on all other hosts, in the /opt/hawq/etc/ directory
Now, as hdfs user, create the hawqadmin home and the data dir on hdfs
[hdfs@hdpmaster01~]$ hdfs dfs -mkdir /user/hawqadmin && hdfs dfs -chown hawqadmin /user/hawqadmin [hdfs@hdpmaster01~]$ hdfs dfs -mkdir /hawq_default && hdfs dfs -chown hawqadmin /hawq_default [hdfs@hdpmaster01~]$
On both masters, create the master data dir:
mkdir -p /data01/hawq/masterdd && chown -R hawqadmin /data01/hawq
Create the segments data dir on all slaves
mkdir -p /data01/hawq/segmentdd && chown -R hawqadmin /data01/hawq
Initialize the cluster as hawqadmin user. Remember to source the environment file before execute any action (/opt/hawq/greenplum_path.sh)
[hawqadmin@hdpmaster01 hawq]$ cd /opt/hawq/ [hawqadmin@hdpmaster01 hawq]$ source greenplum_path.sh [hawqadmin@hdpmaster01 hawq]$ hawq init cluster -av [...] 20160229:15:42:40:158114 hawq_init:hdpmaster01:hawqadmin-[INFO]:-Init HAWQ cluster successfully [hawqadmin@hdpmaster01 hawq]$
The init statement also starts the cluster, so you can now check the cluster state with the following command
hawq state cluster
You can also see the running application on YARN:
[hawqadmin@hdpmaster01 ~]$ yarn application -list | awk '/application_/ {printf ("%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$4,$5)}' application_1456240841318_0026 hawq YARN hawqadmin default
Now, connect to the database and create a sample table:
[hawqadmin@hdpmaster01 hawq]$ psql -d postgres psql (8.2.15) Type "help" for help. postgres=# \d No relations found. postgres=# create table test (field1 int, field2 varchar(30)); CREATE TABLE postgres=# \d+ test Append-Only Table "public.test" Column | Type | Modifiers | Storage | Description --------+-----------------------+-----------+----------+------------- field1 | integer | | plain | field2 | character varying(30) | | extended | Compression Type: None Compression Level: 0 Block Size: 32768 Checksum: f Has OIDs: no Options: appendonly=true Distributed randomly postgres=# insert into test (field1, field2) values (1, 'May the hawq be with you'); INSERT 0 1 postgres=# select * from test; field1 | field2 --------+--------------------------- 1 | May the hawq be with you (1 row) postgres=#
That's all! 🙂
Created on 03-09-2016 05:55 PM
Hi Davide,
Thanks for the post. I'm trying to stand this up on a small cluster on AWS - 1 master/4 workers. I've made it through the build and install but am getting the following error during the "hawq init cluster -av"
From the /home/hawqadmin/hawqAdminLogs/hawq_init_20160309.log
......... 20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[DEBUG]:-Check hdfs: /opt/hawq/bin/gpcheckhdfs hdfs ip-172-31-31-145.ec2.internal:8020/hawq_default off 20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-4 segment hosts defined 20160309:11:40:30:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Set default_segment_num as: 32 20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Start to init master node: 'ip-172-31-31-145.ec2.internal' The files belonging to this database system will be owned by user "hawqadmin". This user must also own the server process. The database cluster will be initialized with locale en_US.utf8. fixing permissions on existing directory /data01/hawq/masterdd ... ok creating subdirectories ... ok selecting default max_connections ... initdb: error 256 from: "/opt/hawq/bin/postgres" --boot -x0 -F -c max_connections=1280 -c shared_buffers=4000 -c max_fsm_pages=200000 < "/dev/null" > "/home/hawqadmin/hawqAdminLogs/master.initdb" 2>&1 initdb: removing contents of data directory "/data01/hawq/masterdd" Master postgres initdb failed 20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[INFO]:-Master postgres initdb failed 20160309:11:40:42:005594 hawq_init:ip-172-31-31-145:hawqadmin-[ERROR]:-Master init failed, exit
This is running on Centos7 and I found the following JIRA in BigTop: https://issues.apache.org/jira/browse/BIGTOP-2325 Any ideas on a work around?
Thanks!
Ryan
Created on 03-10-2016 10:46 AM
Hi Ryan,
could you check if you have the right permissions on the local directory?
[hawqadmin@hdpmaster01 ~]$ ls -ld /data01/hawq/masterdd/ drwx------ 16 hawqadmin hadoop 4096 Mar 1 09:19 /data01/hawq/masterdd/ [hawqadmin@hdpmaster01 ~]$ ls -l /data01/hawq/masterdd/ total 40 drwx------ 5 hawqadmin hawqadmin 38 Feb 29 15:38 base drwx------ 2 hawqadmin hawqadmin 4096 Mar 1 09:19 global drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_changetracking drwx------ 2 hawqadmin hawqadmin 17 Feb 29 15:38 pg_clog drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_distributedlog drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_distributedxidmap -rw-rw-r-- 1 hawqadmin hawqadmin 4021 Feb 29 15:38 pg_hba.conf -rw------- 1 hawqadmin hawqadmin 1636 Feb 29 15:38 pg_ident.conf drwx------ 2 hawqadmin hawqadmin 156 Mar 1 00:00 pg_log drwx------ 4 hawqadmin hawqadmin 34 Feb 29 15:38 pg_multixact drwx------ 2 hawqadmin hawqadmin 6 Mar 1 09:19 pg_stat_tmp drwx------ 2 hawqadmin hawqadmin 17 Feb 29 15:38 pg_subtrans drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_tblspc drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_twophase drwx------ 2 hawqadmin hawqadmin 6 Feb 29 15:38 pg_utilitymodedtmredo -rw------- 1 hawqadmin hawqadmin 4 Feb 29 15:38 PG_VERSION drwx------ 3 hawqadmin hawqadmin 58 Feb 29 15:38 pg_xlog -rw------- 1 hawqadmin hawqadmin 18393 Feb 29 15:38 postgresql.conf -rw------- 1 hawqadmin hawqadmin 104 Feb 29 15:40 postmaster.opts [hawqadmin@hdpmaster01 ~]$
Also, what are the permissions on the directory on hdfs?
[hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -d /hawq_default drwxr-xr-x - hawqadmin hdfs 0 2016-02-29 15:38 /hawq_default [hawqadmin@hdpmaster01 ~]$ hdfs dfs -ls -R /hawq_default drwx------ - hawqadmin hdfs 0 2016-02-29 15:47 /hawq_default/16385 drwx------ - hawqadmin hdfs 0 2016-03-01 08:54 /hawq_default/16385/16387 drwx------ - hawqadmin hdfs 0 2016-03-01 08:55 /hawq_default/16385/16387/16513 -rw------- 3 hawqadmin hdfs 48 2016-03-01 08:55 /hawq_default/16385/16387/16513/1 -rw------- 3 hawqadmin hdfs 4 2016-02-29 15:47 /hawq_default/16385/16387/PG_VERSION [hawqadmin@hdpmaster01 ~]$
Created on 03-13-2016 11:59 AM
Hi!
Thanks for the article, the one I was using (the CentOS 7 version) mostly omitted the ldconfig part I got stuck on. Although when installing hawq I got some errors: http://pastebin.com/E3heLzS3
Do you have any advice how I could fix this? Thanks!
Tamás
Created on 03-15-2016 08:33 AM
Hi Tamás, I see you are using openjdk 1.7. Try using openjdk 1.8 instead.
Davide
Created on 03-15-2016 11:11 AM
Thanks for the respone, although yesterday I just ended up redoing the whole process based on your article and it installed flawlessly, and everything is fine!
Best regards, Tamás