Created 10-11-2016 03:37 PM
Can someone explain what the "ScanNext" metrics which is exposed as "Read Latency" in Ambari signifies ? Just trying to co-relate this to any performance impact ( if any ) if we see this in order of around 90-10 seconds .
Created 10-11-2016 06:44 PM
Scans in HBase work in batches since there is no streaming RPC in HBase. A scanner is opened for a region and the scan is executed as a series of RPC calls to fetch the next set of results. Every such call is a "next" operation, referred as ScanNext. The scan next call tries to fetch either a predefined set of rows (scanner caching) or predefined max result size (2MB, etc). The behavior depends on the version of HBase as well as configuration. More info here: https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1
Seeing 10-90 seconds in the latency metrics means that most of the RPC call to get the next scan results ended up taking that long. It maybe due to a case where the scan is scanning a lot of data with a highly selective filter and not returning data or something else is wrong causing excessive latency for the scans.
Created 10-11-2016 06:44 PM
Scans in HBase work in batches since there is no streaming RPC in HBase. A scanner is opened for a region and the scan is executed as a series of RPC calls to fetch the next set of results. Every such call is a "next" operation, referred as ScanNext. The scan next call tries to fetch either a predefined set of rows (scanner caching) or predefined max result size (2MB, etc). The behavior depends on the version of HBase as well as configuration. More info here: https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1
Seeing 10-90 seconds in the latency metrics means that most of the RPC call to get the next scan results ended up taking that long. It maybe due to a case where the scan is scanning a lot of data with a highly selective filter and not returning data or something else is wrong causing excessive latency for the scans.
Created 11-07-2016 01:07 AM
I'm seeing a similar "Read Latency" of 10 seconds in Ambari but with HBASE at rest (i.e., nobody is running any scans or anything with it).
This was observed after loading ~30 GB of data (HDFS 30% full now) into a three-node HBase cluster having two Region Servers (these RS's had finished auto balancing regions between them followed by me executing a manual compaction from HBase webGUI). Even restarting HBase has no effect on read latency.
I noticed too that Grafana has (a corresponding?) graph : HBase-Performance --> Operation Latencies : Scan Next -- but oddly it's one of the very few graphs labeled, "No datapoints", as if it were unplugged.
Read latency goes up to 32 seconds when running 'hbase shell scan tableX'... no selective filter used. I'm surprised Ambari doesn't have a built in alarm that trips on anything over 2 seconds (Ambari GUI reports 'no alarms' even when read latency is ~18 min!) Pretty sure we have a config problem but not sure what.
I'm running all this in AWS/Ubuntu14 using Ambari 2.2.2.0 (HDP 2.4.2.0-258) managed HBASE 1.1.2.2.4.2.0-258.
Created 11-07-2016 09:24 PM
The above resting state Read Latency in our system was cut by 98% to 192 ms by increasing hbase.client.scanner.max.result.size from a default of 2 MB to 50 MB. In our case, we have HBASE 'cell size' (==Maximum Record Size == hbase.client.keyvalue.maxsize) of 10MB which is 5x larger than said default.
hbase.client.scanner.max.result.size can be used to change the default "chunk size" transited back to the client. Again, by default this value is 2 MB since HBase 1.1+.
Note about hbase.server.scanner.max.result.size — This setting enforces a maximum result size (in bytes), when reached the server will return the results is has so far. This is a safety setting and should be kept large. The default is inifinite in 0.98 and 1.0.x and 100mb in 1.1 and later.
Created 11-18-2016 08:38 PM
@Enis can you comment on this?