Member since 
    
	
		
		
		10-22-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                28
            
            
                Posts
            
        
                19
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1425 | 08-19-2016 01:32 AM | |
| 8865 | 08-19-2016 12:12 AM | |
| 2202 | 07-17-2016 08:59 PM | |
| 3767 | 07-12-2016 09:34 PM | 
			
    
	
		
		
		06-19-2018
	
		
		09:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 So many heavy calls inside critical RPC loops? That is very weird design decision. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-31-2017
	
		
		11:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 General optimizations   Do not run HDFS balancer. It breaks data locality and data locality is important for latency-sensitive applications  For the very same reason disable HBase auto region balancing: balance_switch false   Disable periodic automatic major compactions for time-series data. Time-series data is immutable (means no update/deletes usually). The only reason remaining for  major compaction is decreasing number of store files, but we will apply different compaction policy, which limits number of files and does not require major compaction (see below)  Presplit table(s) with time-series data in advance.
  Disable region splits completely (set DisabledRegionSplitPolicy). Region splitting results in major compaction and we do not run major compactions because it usually decrease performance, stability and increase operation latencies.
  Enable WAL Compression - decrease write IO. 
   Table design   Do not store data in a raw format - use time-series specific compression (refer to OpenTSDB row key design)  Create coprocessor which will run periodically and compress raw data  Have separate column families for raw and compressed data  Increase
hbase.hstore.blockingStoreFiles
for both column families  Use FIFOCompactionPolicy for raw data (see below)  Use standard exploring compaction with limit on a maximum selection size for compressed data (see below)  Use gzip block compression for raw data (GZ) – decrease write IO. 
  Disable block cache for raw data (you will reduce block cache churn significantly)   FIFO compaction   First-In-First-Out   No
compaction at all   TTL
expired data just get archived   Ideal
for raw data storage (minimum IO overhead)  No
compaction – no block cache trashing   Sustains
100s MB/s write throughput per RS  Available
0.98.17, 1.2+, HDP-2.4+  Refer to https://issues.apache.org/jira/browse/HBASE-14468 for usage and configuration   Exploring Compaction + Max Size   Set
hbase.hstore.compaction.max.size to some appropriate value (say 500MB). With default region size of 10GB this results in maximum 20 store files per region.  This helps in preserving temporal locality of data
– data points which are close will be stored in a same file, distant ones – in a separate files.   This compaction
works
better with  block
cache   More
efficient caching
of recent data is possible   Good
for most-recent-most-valuable data access pattern.   Use
it for compressed and aggregated data   Helps
to keep recent data in a block cache.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		03-23-2017
	
		
		02:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Actually my requirement is to scan through 2400 billions of rows with 3 where conditions and the result of scan will be around 15 million rows. I need to achieve this 2 to 3 seconds.  
   That is 1000 B ( 1T = 10^12) rows per sec.  With average size of a row = 1 byte only - we are looking at 1TB/sec scan speed. With 100 bytes per row - 100TB /sec speed. I think you should reconsider design of your application. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-21-2017
	
		
		11:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 What I suggested is to compare both times. If they close enough, than you can rely on both. If there is a significant discrepancy than I would go with Unix timing. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-21-2017
	
		
		10:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You can time your command and compare numbers if you do not trust the number reported by Sqoop MR Import job 🙂  time IMPORT_COMMAND (Linux) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-01-2016
	
		
		07:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Verify that  you use the same hbase-site.xml on both: client and server sides.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-31-2016
	
		
		02:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 java.net.SocketTimeoutException: callTimeout=60000, callDuration=60307  Have you changed hbase.rpc.timeout? It seems, you have not. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		04:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Josh,
org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)V does not exists in 2.3.2 
org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)[Lorg.apache.hadoop.hbase.client.Put exists.
That looks like incompatibility issue.
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2016
	
		
		06:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 New table region split/merge API  New API in HBase HDP 2.5 allows user to disable/enable automatic region splits and merges. From HBase shell you can run the following commands: Enable region splits  hbase> splitormerge_switch 'SPLIT', true
 Disable region splits hbase> splitormerge_switch 'SPLIT', false
  Enable region merges  hbase> splitormerge_switch 'MERGE', true
 Disable region merges hbase> splitormerge_switch 'MERGE', false
 Check region split switch status hbase> splitormerge_enabled 'SPLIT'  Check region merge switch status  hbase> splitormerge_enabled 'MERGE'  Usage in HBase hbck tool  HBase hbck tool can automatically use this API during restore operation if  the following command-line argument is specified: -disableSplitAndMerge or tool is run in repair mode. Disabling region splits and merges during repair or diagnostic runs improves tool's ability to diagnose and repair HBase cluster.  Usage in table snapshots   It is recommended now to disable both: region splits and merges before you run snapshot command. On a large tables with many regions, splits and merges during snapshot operation will result in snapshot failure during snapshot's verification phase, therefore it is recommended to disable them completely and restore their states after snapshot operation:  hbase> splitormerge_switch 'SPLIT', false
hbase> splitormerge_switch 'MERGE', false
hbase> snapshot 'namespace:sourceTable', 'snapshotName'
hbase> splitormerge_switch 'SPLIT', true
hbase> splitormerge_switch 'MERGE', true  Usage during bulk data load  Bulk loads, sometimes, take a lot of time because, loader tool must split HFiles into new region boundaries. Why? Becuase, during operation, some regions can be split or merged and prepared HFiles, which cross these new boundaries must be split. The split operation is performed in a single JVM and may require substantial time. These splits/merges can continue and will require new HFile splits. These chains of events : region split/merge -> HFile splits -> region splits/merge -> ... can be very long. So this why new split/merge API is important during HBase bulk data load. Disable splits/merges before you run bulk load and restore their status after.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-19-2016
	
		
		01:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)V  That is Flume client issue, version is not compatible with HBase 1.1.2. Make sure you use right version of Flume and if it comes with HBDP 2.3.2, then issue should be raised on Flume/HBase incompatibility. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        










