Created on
03-01-2019
11:41 AM
- last edited on
03-02-2019
05:45 AM
by
cjervis
Here is the error message I am seeing in the role logs/Stderr logsrecently, Do I have to make any configuration changes in CM and then restart this server?
+ HBASE_REST_OPTS= ++ replace_pid -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError '-XX:HeapDumpPath=/tmp/hbase_hbase-HBASETHRIFTSERVER-8b300b6e4f58408d2e4b5170d308788e_pid{{PID}}.hprof' -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh ++ echo -Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError '-XX:HeapDumpPath=/tmp/hbase_hbase-HBASETHRIFTSERVER-8b300b6e4f58408d2e4b5170d308788e_pid{{PID}}.hprof' -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh ++ sed 's#{{PID}}#5940#g' + export 'HBASE_THRIFT_OPTS=-Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hbase_hbase-HBASETHRIFTSERVER-8b300b6e4f58408d2e4b5170d308788e_pid5940.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh' + HBASE_THRIFT_OPTS='-Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hbase_hbase-HBASETHRIFTSERVER-8b300b6e4f58408d2e4b5170d308788e_pid5940.hprof -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
Created 03-04-2019 02:35 AM
The shared stderr snippet looks normal. What do you see in the role logs (under /var/log/hbase folder) during the startup failure? Can you share the complete stack trace if there is any?
Created 03-07-2019 08:35 AM
It don't look like an issue to me in role log. do I have to look into something, this is like a common error I am facing
2019-03-03 10:39:39,626 INFO org.apache.hadoop.hbase.http.HttpServer: Jetty bound to port 9095
2019-03-03 10:39:39,626 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2019-03-03 10:39:40,001 INFO org.mortbay.log: Started SslSocketConnectorSecure@0.0.0.0:9095
2019-03-03 10:39:40,074 INFO org.apache.hadoop.hbase.thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /0.0.0.0:9090 with readTimeout 60000ms; min worker threads=200, max worker threads=1000, max queued requests=1000
2019-03-04 16:23:27,419 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1404ms
No GCs detected
2019-03-05 16:36:39,673 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1035ms
No GCs detected
2019-03-06 16:10:42,642 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1003ms
No GCs detected
Created 03-11-2019 09:26 AM
There is nothing in the logs but rest of services on the same node also showing health issues because of Java heap dump directory free space issue. Do I have to increase the heap size to permanently resolve this?
Created 03-11-2019 07:06 PM
You're observing JVM pauses but "No GCs detected". This indicates problem with the underlying host, typically kernel level CPU lockups or general process hangings.
Check the host's /var/log/messages to find clues about such issues. When found, rectify them and then see if you still face Thrift issues. We'll take it further accordingly.
Created 03-11-2019 07:12 PM
Created 03-12-2019 09:32 AM
Hello, Thanks for a great explanation. I think what you mentioned is actually what is happening in my cluster. Weird thing is that this error started with HBASE thrift server and then same error was on HDFS,Failover controller, Spark. It stayed for a while and I started trouble shooting it. Shockingly yesterday , I saw my cluster was running fine. any reason for this abnormal behavior?
My cluster is DEV cluster. No data in it. We are just about to enable kerberos in few days, TLS/SSL is already enabled. What would be the best solution?
as of now I can see the stdout.log under /var/run/cloudera-scm-agent/process/ :
JAVA_HOME=/usr/java/jdk1.8.0_162/
using /usr/java/jdk1.8.0_162/ as JAVA_HOME
using 5 as CDH_VERSION
using as HBASE_HOME
using /run/cloudera-scm-agent/process/2813-hbase-HBASETHRIFTSERVER as HBASE_CONF_DIR
using /run/cloudera-scm-agent/process/2813-hbase-HBASETHRIFTSERVER as HADOOP_CONF_DIR
using as HADOOP_HOME
CONF_DIR=/run/cloudera-scm-agent/process/2813-hbase-HBASETHRIFTSERVER
CMF_CONF_DIR=/etc/cloudera-scm-agent
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/hbase_hbase-HBASETHRIFTSERVER-8b300b6e4f58408d2e4b5170d308788e_pid24899.hprof ...
Heap dump file created [18196434 bytes in 0.061 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p
/usr/lib64/cmf/service/common/killparent.sh"
# Executing /bin/sh -c "kill -9 24899
/usr/lib64/cmf/service/common/killparent.sh"...
Created 03-12-2019 07:36 PM