About elserj

elserj · ‎06-14-2017

ZooKeeper clients maintain a state machine of connected-ness to the quorum. This state machine is handled in the background as HBase (as a client) interacts with ZooKeeper. Under normal conditions, ZooKeeper clients (HBase services like RegionServer and Master) maintain a "heartbeat" (an RPC) to ZooKeeper every few seconds. This ensures that the client is up to date with the quorum information. When the client is unable to maintain this heartbeat for a significant amount of time (by default, ~30seconds), the ZooKeeper client experiences what is called a session expiration. The session expiration is a result of the ZK-server-side state being cleaned up because it is assumed that the client (HBase) stopped being interested in using ZooKeeper. This state is normally recoverable automatically; however it typically means that HBase is about to stop. Because there are so many operations inside of HBase which rely on using ZooKeeper, it cannot run if it has a prolonged period of time where it cannot access ZooKeeper. All of this typically occurs after a unexpected "pause" of the HBase service, due to load on HBase, the machine running HBase, or a host of other issues. Investigate if there were Java GC issues at the time which caused HBase to stop responding for a certain amount of time.

elserj · ‎06-06-2017

No, there is presently no option to do what you're asking. HBase stores arbitrary bytes which means that data in any portion of the response object may generate invalid JSON. If you do chose to write some software to solve your issue, I would guess that the Apache HBase community would be accepting of some option/configuration that does add what you're asking for to the REST server.

elserj · ‎06-05-2017

Allocation failures happen when the JVM heap fills up. This is what triggers a garbage collection. It is not a problem, it's how Java GC works. The issue you would need to worry about is if it happens too often or takes too much time.

elserj · ‎06-01-2017

I was recently involved with, quite possibly, the worst HBase performance debugging issue so far in my lifetime. The issue first arose with a generic problem statement: after X hours of processing, tasks accessing HBase begin to take over 10 times longer than prior. Upon restarting HBase, performance returned to expected levels. There were no obvious errors in HBase logs, HDFS logs, or the host's syslog. This problem would manifest itself on a near-constant period: every X hours after restart. It affected different types of client tasks (those reading and writing), and was not limited to a specific node or set of nodes. Strangely, despite all inspection of HBase logs and profiling information, HBase seemed to be functioning perfectly fine. Just, slower. This lead us to investigate numerous operating system configuration changes and monitoring, none of which completely described the circumstances and symptoms of the problem. After many long days of investigation and some JVM options, we stumbled onto the first answer which satisfied (or, at least, didn't invalidate) the circumstances: a known, unfixed bug in Java 7 in which the JIT code compilation is disabled after the JIT's code cache executes a flush to reclaim space. https://bugs.openjdk.java.net/browse/JDK-8051955 The JIT (just-in-time) compiler runs behind the scenes in Java compiling Java byte-code into native machine code. Code compilation is a tool designed to help long-lived Java applications run fast without negatively affecting the start-up time of short-lived applications. After methods are invoked, they are compiled from Java byte code into machine code and cached by the JVM. Subsequent invocations of a method which are cached can directly invoke the machine code instead of having to deal with Java byte-code. Analysis: On a 64-bit JVM with Java 7, this cache has a size of 50MB which is a sufficient amount of size for most applications. Methods which are not used frequently are evicted from this cache; this helps avoid the JVM from quickly reaching the limit. However, with sufficient time, this cache can still become full and trigger a temporary halting of JIT compilation and caching to flush the cache. However in Java 7, there is an unresolved issue in that JIT compilation is not re-enabled after the code cache is flushed. While the process continues to run, no machine code will be cached which means that code is constantly being re-compiled from byte code into machine code. We were able to confirm that this is what was happening by enabling two JVM options for the HBase services in hbase-env.sh: -XX:+PrintCompilation -XX:+PrintSafepointStatistics The first option prints a log message for every compilation, every method marked as "not entrant" (the method is candidate to be removed from the cache), and every method marked as "zombie" (removed from the cache). This is helpful in determining when the JIT compilation is happening. The second option prints debugging information about JVM safepoints which are invoked. A JVM safepoint scan be thought of as a low-level "lock" -- the safepoint is taken to provide mutual exclusion at the JVM level. A common use for enabling this option is to analyze the frequency and time taken by garbage collection operations. For example, the concurrent-mark-and-sweep (CMS) collector takes safepoints for various points in its execution. When the code cache becomes full and a flushing event occurs, a safepoint is taken named "HandleFullCodeCache". The combination of these two options can show that a Java process is performing JIT compilation up until the point that the "HandleFullCodeCache" safepoint is executed, and then not further JIT compilation happens after that point. In our case, the time after JIT compilation was not happening was near within one hour of when the tasks reportedly began to see performance issues. In our case, we did not observe the following log message which was meant to make this obtuse issue more obvious. We missed it because we were working remotely and on a decent sized installation which made it not feasible to collect and analyze all logs: Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. Solution: There are two solutions to this problem: one short-term and one long-term. The short-term solution is to increase the size of the JVM Code Cache from the default of 50MB on 64-bit JVMs. This can be accomplished via the -XX:ReservedCodeCacheSize JVM option. Increasing this to a larger value can ultimately prevent the code cache from ever becoming completely full. export HBASE_SERVER_OPTS="$HBASE_SERVER_OPTS -XX:ReservedCodeCacheSize=256m" On HDP releases <=2.6, it is necessary to set HBASE_REGIONSERVER_OPTS variable explicitly instead. export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:ReservedCodeCacheSize=256m" The implication of this configuration is that it would remove available on-heap memory, but this is typically quite minor (100's of MB when we typically consider 1's of GB). The long-term solution is to upgrade to Java 8. Java 7 is long end-of-life'ed by Oracle and this is a prime example of known issues which were never patched in Java 7. It is strongly recommended that any user still on Java 7 have a plan to move to Java 8 as soon as possible. No other changes would be required on Java 8 as it is not subject to this bug.

elserj · ‎06-01-2017

MapReduce over HBase Snapshots expects the snapshot to exist in the HBase installation, not the exported version of it. You are also providing a path with file:// where you probably want hdfs://.

elserj · ‎05-30-2017

Yes, you are correct. I meant twenty times, not two.

elserj · ‎05-29-2017

Your comment is a little unclear in what you pasted here, so let me try to clarify. The *maximum* allowed ZooKeeper session timeout is defined to be twenty times the ZK tickTime. By default on HDP, that would be a maximum allowed session timeout of 40s. If you feel that you must increase this timeout further, you would have to increase the tickTime as well. Users should *not* do this without considering the consequences. Increasing the ZooKeeper tickTime will increase the amount of time that ZooKeeper processes and distributes notifications. In other words, it will increase the latency of applications using ZooKeeper. Increasing the ZooKeeper session timeout and tickTime to large values indicates that something else is likely wrong with your system and the session timeout is being used a bandaid to hid another problem. The default 30s timeout should be more than enough for most systems to happily operate.

elserj · ‎05-25-2017

To debug this, you should provide: The full exception Reproduce the problem with the JVM property "-Dsun.security.krb5.debug=true" on both client and server (capture those logs) Enable DEBUG Log4j level on the RegionServers and capture relevant logging. It is somewhat strange that you use "localhost" as the "instance" component of the Kerberos principal. Typically, this is the FQDN for the network interface that the service is listening on. If you are not running everything on 127.0.0.1, you may be running into DNS issues.

elserj · ‎05-16-2017

The classname is "org.apache.hadoop.hive.hbase.HBaseStorageHandler" (capital 'B' in 'HBaseStorageHandler').

elserj · ‎05-15-2017

phoenix.client.connection.max.allowed.connections does not exist in the version of Phoenix shipped in HDP-2.4.0.7. hbase.hconnection.threads.max defaults to 256 unless you have configured it otherwise in hbase-site.xml.

Online	Offline
Last Visited	‎07-01-2022 02:44 PM

Member Since	‎07-17-2019 08:58 AM
Last Visited	‎07-01-2022 02:44 PM
Posts	738
Kudos received	429

Cloudera Community

Re: Why can't Object Stores like Amazon S3 be used...

Re: Not a host:port pair: PBUF, how to resolve?

Re: versioning question in hbase

Re: Phoenix query call from java on larger data se...

Re: Revoke permissions to a superuser on Hbase

Re: I have running a 12 node cluster, with Hbase m...

Re: get human readable JSON from the HBase rest se...

Re: Hbase getting gc.log error as ([GC (Allocation...

Time-Delayed HBase Performance Degradation with Ja...

Re: Running MaprRduce on Hbase table snapshot not ...

Re: Hbase Region servers going to dead with "YouAr...

Re: Hbase Region servers going to dead with "YouAr...

Re: Unable to connect to kerberos hbase

Re: Hi, I am using HDP 2.6 and I cant seem to crea...

Re: Default values for connections to Phoenix/HBas...