Member since
10-22-2015
28
Posts
19
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
206 | 08-19-2016 01:32 AM | |
1935 | 08-19-2016 12:12 AM | |
416 | 07-17-2016 08:59 PM | |
633 | 07-12-2016 09:34 PM |
09-21-2018
06:42 PM
Possible scenarios: 1. You made a full backup 2. Added some data 3.1 Restarted region server with different port number or 3.2 De-commissioned region server 4. Then run incremental backup Is this what you have done? @Aparna N
... View more
09-21-2018
06:25 PM
Never mind, I will file a bug. Have you add new RS at the time (during backup execution)? @Aparna N
... View more
09-21-2018
05:29 PM
What is the exact version of HDP do you have? with all minor subversions?
... View more
06-19-2018
09:55 PM
So many heavy calls inside critical RPC loops? That is very weird design decision.
... View more
03-31-2017
11:25 PM
6 Kudos
General optimizations Do not run HDFS balancer. It breaks data locality and data locality is important for latency-sensitive applications For the very same reason disable HBase auto region balancing: balance_switch false Disable periodic automatic major compactions for time-series data. Time-series data is immutable (means no update/deletes usually). The only reason remaining for major compaction is decreasing number of store files, but we will apply different compaction policy, which limits number of files and does not require major compaction (see below) Presplit table(s) with time-series data in advance.
Disable region splits completely (set DisabledRegionSplitPolicy). Region splitting results in major compaction and we do not run major compactions because it usually decrease performance, stability and increase operation latencies.
Enable WAL Compression - decrease write IO.
Table design Do not store data in a raw format - use time-series specific compression (refer to OpenTSDB row key design) Create coprocessor which will run periodically and compress raw data Have separate column families for raw and compressed data Increase
hbase.hstore.blockingStoreFiles
for both column families Use FIFOCompactionPolicy for raw data (see below) Use standard exploring compaction with limit on a maximum selection size for compressed data (see below) Use gzip block compression for raw data (GZ) – decrease write IO.
Disable block cache for raw data (you will reduce block cache churn significantly) FIFO compaction First-In-First-Out No
compaction at all TTL
expired data just get archived Ideal
for raw data storage (minimum IO overhead) No
compaction – no block cache trashing Sustains
100s MB/s write throughput per RS Available
0.98.17, 1.2+, HDP-2.4+ Refer to https://issues.apache.org/jira/browse/HBASE-14468 for usage and configuration Exploring Compaction + Max Size Set
hbase.hstore.compaction.max.size to some appropriate value (say 500MB). With default region size of 10GB this results in maximum 20 store files per region. This helps in preserving temporal locality of data
– data points which are close will be stored in a same file, distant ones – in a separate files. This compaction
works
better with block
cache More
efficient caching
of recent data is possible Good
for most-recent-most-valuable data access pattern. Use
it for compressed and aggregated data Helps
to keep recent data in a block cache.
... View more
- Find more articles tagged with:
- Design & Architecture
- HBase
- How-ToTutorial
- modern-data-application
- timeseries
Labels:
03-23-2017
02:40 AM
Actually my requirement is to scan through 2400 billions of rows with 3 where conditions and the result of scan will be around 15 million rows. I need to achieve this 2 to 3 seconds.
That is 1000 B ( 1T = 10^12) rows per sec. With average size of a row = 1 byte only - we are looking at 1TB/sec scan speed. With 100 bytes per row - 100TB /sec speed. I think you should reconsider design of your application.
... View more
03-21-2017
11:26 PM
What I suggested is to compare both times. If they close enough, than you can rely on both. If there is a significant discrepancy than I would go with Unix timing.
... View more
03-21-2017
10:09 PM
You can time your command and compare numbers if you do not trust the number reported by Sqoop MR Import job 🙂 time IMPORT_COMMAND (Linux)
... View more
03-17-2017
09:25 PM
5 Kudos
>> Region Server will take longer to report failure for master to trigger failover No, RS does not report failures to Master. Master detects timeout as a listener for Zk events. Every RS, including Master sends periodically pings to ZK (this timeout - zookeeper.session.timeout is different from RPC timeout) and if pings does not for longer than Zk session timeout, Master declares RD dead and starts failover procedure. In case of very large RPC timeout, client application has no chance to react to outages in any meaningful way. This is the major consequence of an increased RPC timeout.
... View more
09-01-2016
07:42 PM
Verify that you use the same hbase-site.xml on both: client and server sides.
... View more
08-31-2016
02:20 AM
java.net.SocketTimeoutException: callTimeout=60000, callDuration=60307 Have you changed hbase.rpc.timeout? It seems, you have not.
... View more
08-25-2016
04:34 PM
Josh,
org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)V does not exists in 2.3.2
org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)[Lorg.apache.hadoop.hbase.client.Put exists.
That looks like incompatibility issue.
... View more
08-25-2016
06:14 AM
2 Kudos
New table region split/merge API New API in HBase HDP 2.5 allows user to disable/enable automatic region splits and merges. From HBase shell you can run the following commands: Enable region splits hbase> splitormerge_switch 'SPLIT', true
Disable region splits hbase> splitormerge_switch 'SPLIT', false
Enable region merges hbase> splitormerge_switch 'MERGE', true
Disable region merges hbase> splitormerge_switch 'MERGE', false
Check region split switch status hbase> splitormerge_enabled 'SPLIT' Check region merge switch status hbase> splitormerge_enabled 'MERGE' Usage in HBase hbck tool HBase hbck tool can automatically use this API during restore operation if the following command-line argument is specified: -disableSplitAndMerge or tool is run in repair mode. Disabling region splits and merges during repair or diagnostic runs improves tool's ability to diagnose and repair HBase cluster. Usage in table snapshots It is recommended now to disable both: region splits and merges before you run snapshot command. On a large tables with many regions, splits and merges during snapshot operation will result in snapshot failure during snapshot's verification phase, therefore it is recommended to disable them completely and restore their states after snapshot operation: hbase> splitormerge_switch 'SPLIT', false
hbase> splitormerge_switch 'MERGE', false
hbase> snapshot 'namespace:sourceTable', 'snapshotName'
hbase> splitormerge_switch 'SPLIT', true
hbase> splitormerge_switch 'MERGE', true Usage during bulk data load Bulk loads, sometimes, take a lot of time because, loader tool must split HFiles into new region boundaries. Why? Becuase, during operation, some regions can be split or merged and prepared HFiles, which cross these new boundaries must be split. The split operation is performed in a single JVM and may require substantial time. These splits/merges can continue and will require new HFile splits. These chains of events : region split/merge -> HFile splits -> region splits/merge -> ... can be very long. So this why new split/merge API is important during HBase bulk data load. Disable splits/merges before you run bulk load and restore their status after.
... View more
- Find more articles tagged with:
- Data Processing
- HBase
- hdp-2.5
- How-ToTutorial
- new-feature
Labels:
08-19-2016
01:32 AM
org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)V That is Flume client issue, version is not compatible with HBase 1.1.2. Make sure you use right version of Flume and if it comes with HBDP 2.3.2, then issue should be raised on Flume/HBase incompatibility.
... View more
08-19-2016
12:12 AM
1 Kudo
Yes, with IncreasingToUpeerBoundRegionSplitPolicy it is possible to have a split of a region which is far from maximum size - this is expected behavior. The reason why? HBase tries to create many regions while they are small and distribute them across the cluster. You will need to switch to ConstantSizeRegionSplitPolicy if you do not want this. hbase.regionserver.region.split.policy controls the setting per HBase table.
... View more
08-11-2016
08:31 PM
@Rao Yendluri, you have not been specific enough asking the question.
... View more
07-27-2016
10:19 PM
1 Kudo
checkAndDelete will always perform delete and return true if row is missing The only way to notify caller on success/failure (deleted or not) is to write custom RegionCoprocesor and overwrite the method preCheckAndDeleteAfterRowLock - inside this call you check if row is missing and return Boolean.FALSE in this case, if row is present than return null. see HBase doc on HBase coprocessor API
... View more
07-17-2016
08:59 PM
1 Kudo
First of all, do you see 20 regions in Web UI? If , yes check data distribution per region (for every region you can get total store size) You are probably hitting single region because all your data keys are skewed. If you do not know key distribution it does not make sense to presplit table - leave it to HBase.
... View more
07-12-2016
09:43 PM
Please post zoo.cfg content. You may have wrong connection strings in config file: server.1= server.2= server.3= or zero max client connections (sometime it happens)
... View more
07-12-2016
09:37 PM
Can you check zookeeper log on 10.0.1.105? May be it failed to start.
... View more
06-12-2016
08:34 PM
1 Kudo
Yes, you can not create table until cluster is up and ready. Please, give some time for Master initialization. If you are not able to access HBase even after long period of time - something is wrong. Please post Master log here.
... View more
06-12-2016
08:31 PM
HBase is not a good alternative to a memory-based distributed caches.
... View more
06-02-2016
07:54 PM
2 Kudos
Two major reasons for RegionTooBusyException Failure to acquire region lock (look for "
failed to get a lock in
" in map task log) Region memstore is above limit and flushes can not keep up with load (look for "Above memstore limit") To mitigate 1. you can increase maximum busy wait timeout hbase.ipc.client.call.purge.timeout in ms (default is 120000) directly but do not forget to increase hbase.rpc.timeout accordingly (set it to the same value) To mitigate 2. you can increase hbase.hregion.memstore.block.multiplier from default(4) to some higher value. But the best option for you use bulk import option: -Dimport.bulk.output=/path/for/output followed by completebulkload tool
See: https://hbase.apache.org/book.html#arch.bulk.load.complete
... View more
06-02-2016
02:40 AM
If user is allowed to CREATE/MODIFY table he(she) can do any configuration changes including replication. The only way to disable REGION_REPLICATION for regular user and make it available for superuser is to patch the existing hbase code or to open HBase JIRA and propose the elegant solution how to new security feature w/o making everything more complex.
... View more
06-02-2016
01:36 AM
Whats that Hbase can do that ML cannot? Scales to 1000s of nodes. Part of Hadoop stack with all bells and whistles (data sink and data source for Spark, Storm, Pig, Hive etc) 100% open source First of all, if customer is happy with ML, there is no need to push HBase, otherwise you should ask them what problem they try to solve, which does not fit ML well and post their use case here - we will figure out if it is a good fit for HBase.
... View more
05-20-2016
09:03 PM
The default value of 1.2 is good for majority of cases. You can play with enabling off peak compaction (when this ratio is set to 5 by default, as far as I remember) as well - hbase.hstore.compaction.ratio.offpeak.
... View more
05-20-2016
08:59 PM
You can always set it directly in hbase-site.xml, then sync across cluster. Manual work though.
... View more