Community Articles

amcbarnett · ‎01-19-2017

HDB 2.1.1

Reference: http://hdb.docs.pivotal.io/211

http://hdb.docs.pivotal.io/211/hdb/releasenotes/HAWQ211ReleaseNotes.html

http://hdb.docs.pivotal.io/211/hdb/install/install-ambari.html

Download HDB from Hortonworks at http://hortonworks.com/downloads/ or directly from Pivotal at https://network.pivotal.io/products/pivotal-hdb (You need to create a pivotal account)

What to look out for

If you use only only 1 Master Node, you cannot have a Hawq Master and Standby
If I install Hawq Master on Same node with Ambari need to change PostGres Port from 5432 on Install

Prep

Ensure that httpd is installed

yum install httpd

sudo service httpd status

sudo service httpd start

Get and Install repo

Log onto Pivotal and download hdb-2.1.1.0-7.tar

/* On Ambari Node */

1. mkdir /staging
2. chmod a+rx /staging
3. scp -i <<your key>>  -o 'StrictHostKeyChecking=no' hdb-2.1.1.0-7.tar root@<<ambarinode>>:~/staging
4. tar -zxvf hdb-2.1.1.0-7.tarcd /staging/hdb-2.1.1.0./setup_repo.sh

/* You should see the message “hdb-2.1.1.0 Repo file successfully created at /etc/yum.repos.d/hdb-2.1.1.0.repo. */

5. yum install -y hawq-ambari-plugin

6. cd /var/lib/hawq

7. ./add-hawq.py --user admin --password admin --stack HDP-2.5 
/* if the repo is in the same node as Ambari else pint to where the repo lives*/
./add-hawq.py --user <admin-username> --password <admin-password> --stack HDP-2.5 --hawqrepo <hdb-2.1.x-url> --addonsrepo <hdb-add-ons-2.1.x-url>

8. ambari-server restart

Configurations during Install with Ambari

Set VM overcommit to 0 if you plan to use Hive and/or LLAP also on the same cluster; Don’t follow Pivotal docs to set this to 2 ele your Hive processes will have memory issues.
Advanced hdfs-site

Property	Setting
dfs.allow.truncate	true
dfs.block.access.token.enable	false for an unsecured HDFS cluster, or true for a secure cluster
dfs.block.local-path-access.user	gpadmin
dfs.client.read.shortcircuit	true
dfs.client.socket-timeout***	300000000
dfs.client.use.legacy.blockreader.local	false
dfs.datanode.handler.count	60
dfs.datanode.socket.write.timeout***	7200000
dfs.namenode.handler.count	600
dfs.support.append	true

Advanced core-site

Property	Setting
ipc.client.connection.maxidletime**	3600000
ipc.client.connect.timeout**	300000
ipc.server.listen.queue.size	3300

Some HAWQ Resources

Date Type Formating Functions: https://www.postgresql.org/docs/8.2/static/functions-formatting.html
Date Time Functions: https://www.postgresql.org/docs/8.2/static/functions-datetime.html
Hawq Date Functions: http://tapoueh.org/blog/2013/08/20-Window-Functions HAWQ is better with dates; can automatically handle ’08/01/2016’ and ’01-Aug-2016’
PostGreSQL Cheat Sheet Commands: http://www.postgresonline.com/downloads/special_feature/postgresql83_psql_cheatsheet.pdf
System Catalog Tables: http://hdb.docs.pivotal.io/131/docs-hawq-shared/ref_guide/system_catalogs/catalog_ref-tables.html

HAWQ Toolkit

Make sure and make use of the Hawq Toolkit: http://hdb.docs.pivotal.io/211/hawq/reference/toolkit/hawq_toolkit.html
How to find the data files for specific tables: https://discuss.pivotal.io/hc/en-us/articles/204072646-Pivotal-HAWQ-find-data-files-for-specific-tab...
Size of table on Disk:

select * from hawq_toolkit.hawq_size_of_table_disk;

How to find the Size of Database:

select sodddatname, sodddatsize/(1024*1024) from hawq_toolkit.hawq_size_of_database;

How to find the Size of Partitioned Tables:

select * hawq_toolkit.hawq_size_of_partition_and_indexes_disk

Tip to find how many segments for a Hawq Table

SELECT gp_segment_id, COUNT(*)
FROM <<table>>
GROUP BY gp_segment_id
ORDER BY 1;

Creating Tables

<<TBD>

Make SURE AFTER YOU CREATE THE TABLE ANALYZE: As an Example:

vacuum analyze device.priority_counter_hist_rand;

Loading Data to Tables

<<TBD>

Potential HAWQ Errors

Too many open files in system

To fix this check the value for fs.file-max in /etc/sysctl.conf. If configured a value that is lower than the total # of open files for the entire system at a given point (lsof | wc -l) then we would have increase this. To increase this value follow the below steps
1. Open Files: lsof | wc -l
2. ulimit -a | grep open
3. Edit the following line in the /etc/sysctl.conf file:
4. fs.file-max = value #value is the new file descriptor limit that you want to set.
5. Apply the change by running the following command:# /sbin/sysctl -p
6. We can disable over-commit temporarily: echo 0 > /proc/sys/vm/overcommit_memory
For permanent solution:
1. Add vm.overcommit_memory = 0 in /etc/sysctl.conf
2. #fs.file-max=65536
3. fs.file-max=2900000
4. #Added for Hortonworks HDB
5. kernel.threads-max=798720
6. vm.overcommit_memory=0

Cloudera Community

Community Articles

Cheat Sheet on Hortonworks HDB powered by Apache HAWQ

Apache Ambari

Apache Hive

HDFS

Hortonworks Data Platform (HDP)