Member since 
    
	
		
		
		08-16-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                642
            
            
                Posts
            
        
                131
            
            
                Kudos Received
            
        
                68
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3976 | 10-13-2017 09:42 PM | |
| 7470 | 09-14-2017 11:15 AM | |
| 3796 | 09-13-2017 10:35 PM | |
| 6031 | 09-13-2017 10:25 PM | |
| 6598 | 09-13-2017 10:05 PM | 
			
    
	
		
		
		06-26-2017
	
		
		01:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Rack awareness service three purposes data locality,data redundacy and reducing the network bandwidth requirement.  The replication factor sets your data redudancy level.     It does not seem wise to be abitrarily changing either due to cluster growth.  Simply buy more nodes and expand.     To address the original question:     1. Changing the replication factor will mark the third block in all sets as bad and remove it.  Due to the write workflow of HDFS that means that the remaining two block will be split between at least two racks.     2. Adjusting the rack topology will not impact any existing data.  It will effect MR job performance as now blocks may not be local within the new rack topology.  Newly written data will be split between the two racks.     No matter the order, if you do both you will be adding the risk of your two blocks existing withing the same rack.  You can run the balancer immediately after and that should help as the balancer will abide by the new rack topology but it won't touch or move all of the blocks.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-24-2017
	
		
		05:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I am not positive on this but I think this is a HS2 setting as the functionality is that at HS2 it decides whether to run it locally or to launch a MR job.    Try applying the change to HS2 and restarting.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2017
	
		
		06:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Ah, what you are looking for is the setting:    Fetch Task Query Conversion  hive.fetch.task.conversion    Setting this to none will force all queries to run in MR.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2017
	
		
		06:30 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This has to do with the YARN memory settings.  The amount of memory allocated to yarn is only 8 GB.  I don't know what the minimum container size is, probably around 1.3 G.  That combination of the two determine the amount of containers that can be launched.  The result of that for your cluster is 6 containers.  Anything beyond that will have to wait for resources to be freed up.     https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/  https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2017
	
		
		06:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 No you cannot.  That file is used to store impala-shell configuraiton settings (i.e. -k or kerberos) and not Impala session variables.     https://www.cloudera.com/documentation/enterprise/5-3-x/topics/impala_shell_options.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2017
	
		
		05:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That setting, mapreduce.framework.name, can be found in the the yarn-site.xml.  Check for it under /etc/hadoop/conf/ and for the value.  If it is there with yarn as the value, then it is likely that HS2 is not running with the correct Hadoop environmental variables like HADOOP_CONF_DIR.  If it isn't there or the value is incorrect then try  installing the YARN gateway role on the HS2 host. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2017
	
		
		06:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The database it is trying to access is the backend to the Hive Metastore.  Are you able to access and view databases and tables in Hive? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2017
	
		
		11:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@andrzej_jedrzej what specifically lead you to this? I know at times it can be difficult to troubleshoot issues in NTP and the various commands get confusing (i.e. ntpdate, nptq, etc.). Chrony and NTP look very similar in the install and configuration. What exactly is so different between them?
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2017
	
		
		09:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							The table definition defines the virtual/partition column and in HDFS it is created as directories and subdirectories. So it checks the table definition and then searches for a directory under the table directory that matches the partition column name, and then prunes by the value.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2017
	
		
		09:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Ah yes, sorry, add princ is for adding the principal to the kerberos database. Add entry is for adding an entry to be written to a keytab file using ktuil.    Yes, do add_entry for cloudera/admin@IM, and then wkt.
						
					
					... View more