Member since
10-22-2015
28
Posts
19
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
898 | 08-19-2016 01:32 AM | |
6430 | 08-19-2016 12:12 AM | |
1487 | 07-17-2016 08:59 PM | |
2585 | 07-12-2016 09:34 PM |
05-06-2022
02:42 AM
@arunpoy If we are using the CDH/CDP both timeout parameters(hbase.rpc.timeout & hbase.client.scanner.timeout.period) need to be added in both server-side and client-side in the below paths from the HBase configuration. HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml Bothe the time-out parameters need to be added on the server and client-side and RPC(hbase.rpc.timeout) time-out needs to be set a bit higher than the client scanner time-out (hbase.client.scanner.timeout.period).
... View more
09-08-2021
02:43 PM
This whole series is really insightful and helpful!
... View more
03-31-2017
11:25 PM
6 Kudos
General optimizations Do not run HDFS balancer. It breaks data locality and data locality is important for latency-sensitive applications For the very same reason disable HBase auto region balancing: balance_switch false Disable periodic automatic major compactions for time-series data. Time-series data is immutable (means no update/deletes usually). The only reason remaining for major compaction is decreasing number of store files, but we will apply different compaction policy, which limits number of files and does not require major compaction (see below) Presplit table(s) with time-series data in advance.
Disable region splits completely (set DisabledRegionSplitPolicy). Region splitting results in major compaction and we do not run major compactions because it usually decrease performance, stability and increase operation latencies.
Enable WAL Compression - decrease write IO.
Table design Do not store data in a raw format - use time-series specific compression (refer to OpenTSDB row key design) Create coprocessor which will run periodically and compress raw data Have separate column families for raw and compressed data Increase
hbase.hstore.blockingStoreFiles
for both column families Use FIFOCompactionPolicy for raw data (see below) Use standard exploring compaction with limit on a maximum selection size for compressed data (see below) Use gzip block compression for raw data (GZ) – decrease write IO.
Disable block cache for raw data (you will reduce block cache churn significantly) FIFO compaction First-In-First-Out No
compaction at all TTL
expired data just get archived Ideal
for raw data storage (minimum IO overhead) No
compaction – no block cache trashing Sustains
100s MB/s write throughput per RS Available
0.98.17, 1.2+, HDP-2.4+ Refer to https://issues.apache.org/jira/browse/HBASE-14468 for usage and configuration Exploring Compaction + Max Size Set
hbase.hstore.compaction.max.size to some appropriate value (say 500MB). With default region size of 10GB this results in maximum 20 store files per region. This helps in preserving temporal locality of data
– data points which are close will be stored in a same file, distant ones – in a separate files. This compaction
works
better with block
cache More
efficient caching
of recent data is possible Good
for most-recent-most-valuable data access pattern. Use
it for compressed and aggregated data Helps
to keep recent data in a block cache.
... View more
Labels:
03-23-2017
09:37 PM
I see elapsed time 2m50 seconds and in the bottom : Aggregate Resource Allocation:53607 MB-seconds, 213 vcore-seconds
... View more
08-25-2016
06:14 AM
2 Kudos
New table region split/merge API New API in HBase HDP 2.5 allows user to disable/enable automatic region splits and merges. From HBase shell you can run the following commands: Enable region splits hbase> splitormerge_switch 'SPLIT', true
Disable region splits hbase> splitormerge_switch 'SPLIT', false
Enable region merges hbase> splitormerge_switch 'MERGE', true
Disable region merges hbase> splitormerge_switch 'MERGE', false
Check region split switch status hbase> splitormerge_enabled 'SPLIT' Check region merge switch status hbase> splitormerge_enabled 'MERGE' Usage in HBase hbck tool HBase hbck tool can automatically use this API during restore operation if the following command-line argument is specified: -disableSplitAndMerge or tool is run in repair mode. Disabling region splits and merges during repair or diagnostic runs improves tool's ability to diagnose and repair HBase cluster. Usage in table snapshots It is recommended now to disable both: region splits and merges before you run snapshot command. On a large tables with many regions, splits and merges during snapshot operation will result in snapshot failure during snapshot's verification phase, therefore it is recommended to disable them completely and restore their states after snapshot operation: hbase> splitormerge_switch 'SPLIT', false
hbase> splitormerge_switch 'MERGE', false
hbase> snapshot 'namespace:sourceTable', 'snapshotName'
hbase> splitormerge_switch 'SPLIT', true
hbase> splitormerge_switch 'MERGE', true Usage during bulk data load Bulk loads, sometimes, take a lot of time because, loader tool must split HFiles into new region boundaries. Why? Becuase, during operation, some regions can be split or merged and prepared HFiles, which cross these new boundaries must be split. The split operation is performed in a single JVM and may require substantial time. These splits/merges can continue and will require new HFile splits. These chains of events : region split/merge -> HFile splits -> region splits/merge -> ... can be very long. So this why new split/merge API is important during HBase bulk data load. Disable splits/merges before you run bulk load and restore their status after.
... View more
Labels: