Member since 
    
	
		
		
		09-19-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                4
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3745 | 12-08-2016 08:13 PM | 
			
    
	
		
		
		12-08-2016
	
		
		08:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 So ... after a long hiatus. Turns out this is actually https://issues.apache.org/jira/browse/HBASE-13262     I was using hbase-client 0.96 with HBase 1.0.0 (CDH 5.5) and we had tables that were housing large XML payloads, which would force the bug to manifest when hbase.client.scanner.caching was a high value.      There are multiple ways to fix this:      Use hbase-client 0.98+, if you can afford to upgrade without impact  Lower the value of hbase.client.scanner.caching in CM (this was what I ended up doing)  Programatically, use Scan.setCaching(int) and/or Scan.setMaxResultSize() to avoid the region skipping.        
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-02-2016
	
		
		04:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for this - works for Parquet, but how does one do this for a table from CSV? Let's say a CSV schema changes, I want to be able to use the Avro schema evolution to create the table.  I tried the same create statement, but using STORED AS TEXTFILE and with the ROW FORMAT DELIMITED etc. I end up getting null values. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-19-2016
	
		
		01:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,     I have several HBase tables defined using Avro schemas and I am trying to write a simple Java function to return the entire dataset for a given table (all records).     I'm doing something like this (assume the "Customer avro" schema has been defined):     DatasetReader<Customer> reader = null;    RandomAccessDataset<Customer> customers = Datasets.load(PropertyManager.getDatasetURI(HBaseHelper.CUSTOMER), Customer.class);    reader = customers.newReader();     According to the API docs, this should return the entire unflitered dataset. The URI method also uses the "dataset:" scheme so it is not getting a View.     What I'm seeing is that only a very small subset of the entire table is actually returned when I get a handle to the iterator - ~20 out of 15000 records that are actually in the table, which is barely 0.1%.     Please advise on how to get all records and if this is a defect with Kite - using the native HBase API is not an option because of the Kite encoding which is challenging to work with outside of Kite.     EDIT: we do not seem to see this issue on a single-node HBase, only on an HBase cluster with Kerberos auth.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache HBase
- 
						
							
		
			Kerberos
 
        

