I looked in the HAWQ INIT LOG file and extracted details of the error - relating to POSTGRESQL INITDB failure, below.
Please let me know what parameters/other to re-configure, to allow Postgresql INITDB to succeed:
16-12-22 16:29:34.155702 UTC,,,p178170,th1083267360,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=1, size=50621302 4, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded your ker nel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 506213024 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 3000). If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for. The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1 0x8c7098 postgres errstart + 0x288 2 0x7849fe postgres PGSharedMemoryCreate + 0x22e 3 0x7cf176 postgres CreateSharedMemoryAndSemaphores + 0x336 4 0x8d8509 postgres BaseInit + 0x19 5 0x7e67a2 postgres PostgresMain + 0x482 6 0x4a41ec postgres main + 0x4fc 7 0x7f803c6eed5d libc.so.6 __libc_start_main + 0xfd 8 0x4a4289 postgres <symbol not found> + 0x4a4289 child process exited with exit code 1 initdb: removing contents of data directory "/data/hawq/master" Master postgres initdb failed !!!! 20161222:16:29:34:177771 hawq_init:sandbox:gpadmin-[INFO]:-Master postgres initdb failed !!! 20161222:16:29:34:177771 hawq_init:sandbox:gpadmin-[ERROR]:-Master init failed, exit 20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init' 20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in: 20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20161222.log
Ambari is supposed to set the memory settings for you. It should be in /etc/sysctl.d/hawq_sysctl.conf
If you aren't using Ambari, then you will need to make the changes to your /etc/sysctl.conf file to all nodes.
kernel.sem = 250 512000 100 2048 kernel.pid_max = 798720 kernel.shmmax = 1000000000 net.ipv4.conf.all.arp_filter = 1 kernel.shmall = 4000000000 kernel.msgmnb = 65536 net.ipv4.ip_forward = 0 vm.overcommit_ratio = 100 kernel.shmmni = 4096 net.ipv4.conf.default.accept_source_route = 0 kernel.msgmni = 2048 kernel.core_uses_pid = 1 net.core.rmem_max = 2097152 vm.overcommit_memory = 2 kernel.msgmax = 65536 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_max_syn_backlog = 200000 kernel.threads-max = 798720 fs.nr_open = 3000000 net.core.wmem_max = 2097152 net.core.netdev_max_backlog = 200000 kernel.sysrq = 1 net.ipv4.ip_local_port_range = 1025 65535 net.ipv4.tcp_tw_recycle = 1
And then reload this on all nodes.
You'll also want to set your vm.overcommit_ratio based on the amount of RAM your nodes have. My cluster has plenty of RAM so the ratio is set to 100. But her are the guidelines.
2GB-64GB of RAM, vm.overcommit_ratio = 50
>=64GB of RAM, vm.overcommit_ratio=100
And don't forget swap space:
2GB-8GB of RAM, set swap space equal to RAM
8GB-64GB of RAM, set swap space 0.5*RAM
>= 64GB of RAM, set swap space to 4GB
* soft nofile 2900000 * hard nofile 2900000 * soft nproc 131072 * hard nproc 131072
Lastly, you should review this document: http://hdb.docs.pivotal.io/211/hdb/install/install-cli.html
There are additional HDFS configuration requirements you should pay attention to.