Support Questions
Find answers, ask questions, and share your expertise

ERROR on trying to START the HAWQ MASTER service in HDP 2.4 version

ERROR on trying to START the HAWQ MASTER service in HDP 2.4 version


I looked in the HAWQ INIT LOG file and extracted details of the error - relating to POSTGRESQL INITDB failure, below.

Please let me know what parameters/other to re-configure, to allow Postgresql INITDB to succeed:


16-12-22 16:29:34.155702 UTC,,,p178170,th1083267360,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create 
shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=1, size=50621302
4, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded your ker
nel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.
  To reduce the request size (currently 506213024 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 3000).                                            
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.                                    
The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1    0x8c7098 postgres errstart + 0x288                                        
2    0x7849fe postgres PGSharedMemoryCreate + 0x22e                                                          
3    0x7cf176 postgres CreateSharedMemoryAndSemaphores + 0x336                                               
4    0x8d8509 postgres BaseInit + 0x19                                                                       
5    0x7e67a2 postgres PostgresMain + 0x482                                                                  
6    0x4a41ec postgres main + 0x4fc                                                                          
7    0x7f803c6eed5d __libc_start_main + 0xfd                                                       
8    0x4a4289 postgres <symbol not found> + 0x4a4289                                                         
child process exited with exit code 1                                                                        
initdb: removing contents of data directory "/data/hawq/master"                                              
Master postgres initdb failed  !!!!                                                                               
20161222:16:29:34:177771 hawq_init:sandbox:gpadmin-[INFO]:-Master postgres initdb failed  !!!                  
20161222:16:29:34:177771 hawq_init:sandbox:gpadmin-[ERROR]:-Master init failed, exit                         
20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init'                         
20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in:                              
20161222:16:37:28:180216 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20161222.log

Re: ERROR on trying to START the HAWQ MASTER service in HDP 2.4 version


Ambari is supposed to set the memory settings for you. It should be in /etc/sysctl.d/hawq_sysctl.conf

If you aren't using Ambari, then you will need to make the changes to your /etc/sysctl.conf file to all nodes.

kernel.sem = 250 512000 100 2048
kernel.pid_max = 798720
kernel.shmmax = 1000000000
net.ipv4.conf.all.arp_filter = 1
kernel.shmall = 4000000000
kernel.msgmnb = 65536
net.ipv4.ip_forward = 0
vm.overcommit_ratio = 100
kernel.shmmni = 4096
net.ipv4.conf.default.accept_source_route = 0
kernel.msgmni = 2048
kernel.core_uses_pid = 1
net.core.rmem_max = 2097152
vm.overcommit_memory = 2
kernel.msgmax = 65536
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_syn_backlog = 200000
kernel.threads-max = 798720
fs.nr_open = 3000000
net.core.wmem_max = 2097152
net.core.netdev_max_backlog = 200000
kernel.sysrq = 1
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_tw_recycle = 1

And then reload this on all nodes.

sysctl -p

You'll also want to set your vm.overcommit_ratio based on the amount of RAM your nodes have. My cluster has plenty of RAM so the ratio is set to 100. But her are the guidelines.

2GB-64GB of RAM, vm.overcommit_ratio = 50

>=64GB of RAM, vm.overcommit_ratio=100

And don't forget swap space:

2GB-8GB of RAM, set swap space equal to RAM

8GB-64GB of RAM, set swap space 0.5*RAM

>= 64GB of RAM, set swap space to 4GB

Limits: /etc/security/limits.conf

* soft nofile 2900000
* hard nofile 2900000
* soft nproc 131072
* hard nproc 131072

Lastly, you should review this document:

There are additional HDFS configuration requirements you should pay attention to.