Member since 
    
	
		
		
		01-09-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                401
            
            
                Posts
            
        
                163
            
            
                Kudos Received
            
        
                80
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2596 | 06-21-2017 03:53 PM | |
| 4294 | 03-14-2017 01:24 PM | |
| 2388 | 01-25-2017 03:36 PM | |
| 3840 | 12-20-2016 06:19 PM | |
| 2101 | 12-14-2016 05:24 PM | 
			
    
	
		
		
		07-07-2016
	
		
		12:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I don't see a reason for the first insert to be a text/uncompressed avro file. Using HCatalog, you can directly import from sqoop to hive table as ORC. That would save you a lot of space because of compression.   Once the initial data import is in Hive as ORC, you can then still continue and transform this data as necessary. If the reason for writing as text is to access from Pig and MR, a HCatalog table also can be accessed from Pig/MR.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2016
	
		
		12:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 If you already have a running cluster, exporting blueprint from ambari, editing relevant entries and using that to create a DR cluster works. Another approach is to create your first cluster as well with a blueprint. This will ensure easier creation of second cluster. I don't know of any other custom tools that can create an almost identical cluster 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-06-2016
	
		
		06:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 A quick workaround is to disable ranger authorization from hive->authorization in ambari UI and restart hive. However, if ranger is part of your tests and want to keep it enabled, this is not the solution. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-06-2016
	
		
		06:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 A quick workaround is to disable ranger authorization from hive->authorization in ambari UI and restart hive. However, if ranger is part of your tests and want to keep it enabled, this is not the solution.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-06-2016
	
		
		03:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Checkpointing is the process of merging editlogs with base fsimage. This will be stored in namenode metadata directories. Its not the same as editlog, since editlog has the changes that you make to HDFS.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-05-2016
	
		
		04:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bganesan Now that https://issues.apache.org/jira/browse/RANGER-205 is fixed, can we use the rest API instead of DB script?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-01-2016
	
		
		09:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 When you click on OVA to import, you will see Guest OS Type? What do you see there and has this been changed. You should see Red Hat (64-bit) there. By default you don't change this and if you change this, import is not going to work.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-01-2016
	
		
		03:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If you want to use the files as is, then yes. But do you have the file already split by dates? In that case, you will need to have the date column as both a column and a partition (with different names). But you may be better off reorganizing these files into ORC for better lookup speeds. If you want to do that you will create a second table as ORC and can do an insert overwrite.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-01-2016
	
		
		03:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Why do you need to include date information as a column? If you are creating a merge using Pig (or hive query), you can move the date field that is a column into a partition. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-01-2016
	
		
		03:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 It is difficult to say what should be your PARTITION which this information. Best way to get to finding partition is from future query patterns. If you know there will be a where clause in most of the queries and the value is not high cardinality, then that could be your partition. If you think your queries mostly hit a date range, you could partition by date.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













