Member since 
    
	
		
		
		09-15-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                457
            
            
                Posts
            
        
                507
            
            
                Kudos Received
            
        
                90
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 16916 | 11-01-2016 08:16 AM | |
| 12551 | 11-01-2016 07:45 AM | |
| 11635 | 10-25-2016 09:50 AM | |
| 2484 | 10-21-2016 03:50 AM | |
| 5239 | 10-14-2016 03:12 PM | 
			
    
	
		
		
		03-03-2016
	
		
		08:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Usually when you want to use curl in combination with Kerberos (secured cluster), you have to use the following command:  curl --negotiate -u : -X GET 'http://localhost:50111/templeton/v1/hive?user.name=ekoifman'  Make sure you have a valid kerberos ticket (run: klist) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-03-2016
	
		
		07:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 What Ambari version are you using, is it the latest one (2.2.1.0)? You might have to upgrade to the latest version, so that you have the latest stack information 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-01-2016
	
		
		05:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Important! Only format the Namenode if you do not have any data in your cluster! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-25-2016
	
		
		07:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Great content, thanks for sharing! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-25-2016
	
		
		01:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Hi @prakash pal there are some differences between these data types, basically string allows a variable length of characters (max 32K chars), char is a fixed length string (max. 255 chars). Usually (I doubt that this is different with Impala) CHAR is more efficient and can speed up operations and is better reg. memory allocation. (This does not mean always use CHAR)  See this =>  "All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use a STRING column to hold it."  There are a lot of use cases where it makes sense to only use CHAR  instead of STRING, e.g. lets say you want to have a column that stores the two-letter country code (ISO_3166-1_alpha-2; e.g. US, ES, UK,...), here it makes more sense to use CHAR. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-25-2016
	
		
		06:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this  https://issues.apache.org/jira/browse/HDFS-8475   What features are you missing regarding SparkORC?  I have seen you error before, but in a different context (Query on ORC table was failing)  Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-22-2016
	
		
		06:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 You should be able to see the query in the HiveServer log or a Hive-related UI, like Hive View for Ambari or Hue (there should be a query history). The Resourcemanager does not show the full query, because the job is only named after a partial of the query. Why only a partial? Some queries can be quite large and Job Name is limited in regards to the allowed #characters. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-22-2016
	
		
		06:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Take a look at this questions, maybe it is helpful => https://community.hortonworks.com/questions/3012/what-are-the-steps-an-operator-should-take-to-repl.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-18-2016
	
		
		05:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 You can specify the number of mappers that will be used for the distcp job.     -m <num_maps>  Maximum number of simultaneous copies  Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput.     If nothing is specified, the default should be 20 map tasks.  /* Default number of maps to use for DistCp */  
public static final int DEFAULT_MAPS = 20; 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-18-2016
	
		
		07:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 This sounds like the issue mentioned here https://github.com/cloudera/hue/issues/304, however I dont know a valid workaround for our Hue version at the moment. I strongly encourage you to use different ways to ingest large amounts of data into your cluster, e.g. separate data ingestion node (+hdfs cmds to move files into hdfs), Nifi, distcp,... 
						
					
					... View more