Member since 
    
	
		
		
		01-24-2021
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                9
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		01-18-2024
	
		
		10:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Much Thanks .  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-08-2024
	
		
		10:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As an admin of a CDH cluster ,   some query has submitted  to the hiveserver2 ,  but the query still  in the hiveserver make the explain stage , which does not submit to the yarn cluster and has no application id on yarn . And the query make the hiveserver2 wrong  which  has a too long query such as   select * from aaa where code in ('xxx','xxx1','xxx3',.......'xxx2000000') ;  A sql has more than milion row  may make the hiveserver2 corrupt .      The 10002 web page  seems does not have some action button like yarn web the deal with the query .     The cdh version is 6.3.2 . hive version 2.1.1 .  When this situation occur , I have to restart the hiveserver2 .  I want to know if there is some way to kill a query through hive queryid or hive sessionid o instead of yarn applicationid .     It is also usefull when someone query the metastore which multithread such as "use hadoop " sql with 30 or more active connect  ,  The adminastritor has the ability to kill them forcely .     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		09-15-2023
	
		
		02:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I use CDH 6.3.2 。  hive 2.1   hadoop 3.0  hive on spark 。yarn cluster 。     hive.merge.sparkfiles=true ;  hive.merge.orcfile.stripe.level=true ;  This configuration makes the 1099 reduce file result  merge into one file when the result is small 。Then the merged file has about 1099 stripes in one file 。  Then the result is so slow when it is read.     I tried   hive.merge.orcfile.stripe.level=false ;  The result is desirable 。One small file with one stripe and read fast 。     Can anyone tell the difference between  true and false ?  Why " hive.merge.orcfile.stripe.level=true " is the default one ?       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-18-2023
	
		
		12:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Run hdfs fsck delete.  And found that datanode config wrong.  Less 2 directories datanode store direcory config.  Is there any possible way to rebuild the lost corrupt block? Much thanks  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-28-2022
	
		
		11:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks a lot.     This "yarn application -updatePriority 10 -appId application_xxxx_xx"  seems a config of yarn.  It does not work for spark 2.x in CDH 6.3.2 either.          Does it the same reason which means the 'Application Priority' must match the yarn version with spark version?          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-24-2022
	
		
		06:11 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yarn's application priority can be found in yarn 8088 resource manager's website,  How do the priority work in yarn?   Now the version of CDH I use is 6.3.2.  Hadoop 3.0.0   Hive 2.1.1   I use hive on spark.  Can I use the config to manage the application priority?   For some reason some hive sqls should have high application priority when those sqls are appending and run ahead of the other.  Instead of running the same time and equal share the compute resources.         When I set the following config in Hive, it seems do not work well in Yarn.     MapReduce  "-Dmapreduce.job.priority=xx"    Flink  "-yD yarn.applicaiton.priority=xx"    Spark  "spark.yarn.priority=xx"        In hive sql."set spark.yarn.priority=10;"  It does not work ...       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 - 
						
							
		
			Apache Spark
 - 
						
							
		
			Apache YARN
 
			
    
	
		
		
		01-24-2021
	
		
		09:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hi,  seems the same error .  Excuse me ,  Where is the log of cm yarn usage aggretion logs ?     I also set the pool rules and hive some error with user like admin which has no group matching.  User admin can not submit job to yarn ,but the other normal user can do it .   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-24-2021
	
		
		07:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hi , In the develop enviroment , there is a service named CM yarn usage aggregation runs in yarn per hour . It can be found in jobhistoryserver's  web ui . but in the test env , there is no more . The difference between this two is that develop start with root , test env start with a user with previlege sudo . how to find the starting log of CMyarnusageaggregation  to debug the problems ? The log-aggretion properties are both set to enabled. Much Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache YARN