Member since 
    
	
		
		
		06-28-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                279
            
            
                Posts
            
        
                43
            
            
                Kudos Received
            
        
                24
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2568 | 12-24-2018 08:34 AM | |
| 6357 | 12-24-2018 08:21 AM | |
| 2950 | 08-23-2018 07:09 AM | |
| 11960 | 08-21-2018 05:50 PM | |
| 6183 | 08-20-2018 10:59 AM | 
			
    
	
		
		
		08-21-2018
	
		
		05:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 try /homedir/\*/inbox/\* as the filename or maybe better /homedir as the directory and \*/inbox/\* as the filename 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		05:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 What have you configured as "docker.trusted.registries"? You will need to configure it according to your setup. In a typical setup you create your own docker registry and provide it as trusted. If you don't have already one, you can use this image to run a registry:   https://hub.docker.com/_/registry/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		01:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If PK should be part of the column family depends, in most cases if it is just a sequential number without additional info you will not need it, it will be used as the rowkey in Hbase. And yes, you will have to list all columns, actually they don't have the same name, i.e. in hive definition it is just 'HJMPTS' while in Hbase it is 'cf:HJMPTS'.   This is important, as you could now add a new column family, which could contain also a column HJMPTS. The column name without the columnfamily name isn't necessarily unique in Hbase. In your case it is as you have been migrating an Oracle table. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		12:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 yes you can process with two listFiles processors. So in /homepath all subdirs belong to a customer? Or will there be subdirs not related to customers that you don't want to scan? And for all customers subdirs you want to scan/process the inbox subdir for new files?  Assuming you have only customer dirs your dir pattern can be:  /homedir/*/inbox 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		10:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Your column mapping is wrong, as stated in the error message. The list in the columns mapping must match your list of columns in the external table definition. You simply list all columns in the form "columnFamilyName:columnName". As you seem to have only one column family 'cf', and I assume the oracle columns have all been migrated into the same column name with the column family cf. Then you will need the mapping to be:  "hbase.columns.mapping"="cf:HJMPTS, cf:CREATEDTS, cf:MODIFIEDDTS,     ... ,   cf:PROPTS, cf:P_ISHOMEADDRESS" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		09:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 you can use a regular expression for the filename in the listFile processor. So something like "/homepath/customer_[ABC]/*" should be possible. But you will need to have a pattern to determine customer dirs from other dirs that will match potential additional customers dirs as well. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-16-2018
	
		
		10:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 it's defining a columnname in the filter condition. So in your case it means nothing else then column with the name Age. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2018
	
		
		07:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 have a look here: https://www.crackinghadoop.com/email-spark-dataset-html-format/  it focuses on sending datasets, but you should be able to strip the example to your need. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2018
	
		
		10:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am not completly sure, but I think i came across an information that the limit statement cause a repartioning, so you have significant performance impact by using it. Instead you should use TABLESAMPLE or rewrite the query if it is important which row you get (and not only the limitation) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2018
	
		
		06:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Are you running the same query from both clients, connected to the same server (HIVE)? Dependent on the SQL you are running a table name in front of a column name can be required, so in case of different queries that might be the reason. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













