Member since 
    
	
		
		
		03-04-2021
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                5
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		04-07-2021
	
		
		12:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Magudeswaran ,  I can rewrite the query using sub-queries, but I have the same error:  select count(*) as num_all_changes  from (  select s.cod_pers, count(*) as num_changes  from (  select t.cod_pers, t.cod_address, count(*) as num_address  from be_prd_prt.test_case as t  group by t.cod_pers, t.cod_address  ) as s  group by s.cod_pers  having count(*)>1  ) as gg  ;  +------------------+  | num_all_changes |  +------------------+  | 63 |  | 58 |  | 64 |  | 59 |  +------------------+  4 rows selected (18.077 seconds)     As you can see, always 4 rows... 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-07-2021
	
		
		12:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi asish,  I collected statistics again (I already did it before posting), but the result is the same: 4 rows instead 1.  +------------------+  | num_all_changes |  +------------------+  | 63 |  | 58 |  | 64 |  | 59 |  +------------------+  4 rows selected (13.45 seconds) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-31-2021
	
		
		07:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 The "msck repair table ..." command does not really read new data files, but adds new partitions (subdirectories in HDFS) in table metadata.  What you could do is to create in advance all the partitions (for month or more) - initially empty- and run the "repair" command just once:  hdfs dfs -mkdir /user/data/year=2021/month=04/day=1  ...  hdfs dfs -mkdir /user/data/year=2021/month=04/day=30  hive>msck repair table <table_name>     When You put your log files inside one of these directories, they will be immediately visible from Hive (just set the correct permissions using Ranger or hdfs).     Maybe You can repeat this operations (create directories and "repair table") during logs maintenance, as you should have some policies to remove old logs     Hope this helps 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-31-2021
	
		
		02:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi,  the simple count(*) on the base table is working correctly:  select count(*) from be_prd_prt.test_case;  +---------+  | _c0 |  +---------+  | 537586 |  +---------+  1 row selected (0.2 seconds)  I have troubles with the query I wrote, because it returns a wrong number of rows (not just 1, as expected) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-31-2021
	
		
		01:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  on Hive 3.1.4 we have a COUNT(*) query which returns more than one rows, instead of exactly one.  I created this test table:  CREATE TABLE test_case (  cod_pers STRING,  cod_address STRING,  PRIMARY KEY (cod_pers) DISABLE NOVALIDATE )  PARTITIONED BY (num_snapshot BIGINT) ;  and populated it with more than 500,000 rows in 3 partitions.  To find how many people changed address, I wrote this query:  WITH data1 AS ( -- eliminate duplicates  select t.cod_pers, t.cod_address, count(*) as num_address  from test_case as t  group by t.cod_pers, t.cod_address  ), data2 AS ( -- find changes per person  select s.cod_pers, count(*) as num_changes  from data1 s  group by s.cod_pers  having count(*)>1 )  select count(*) as num_all_changes from data2 as gg ;  Instead of obtaining a single row with the total number, the query returns 4 rows :  +------------------+  | num_all_changes |  +------------------+  | 63 |  | 58 |  | 64 |  | 59 |  +------------------+  4 rows selected (1.252 seconds)  4 is also the number of reducers used by the query.  If I add a "LIMIT 1" clause, the query works as expected, and a final reducer is added in the query execution. The same happens adding "GROUP BY 1" clause.     I tried to COMPACT the table and have the statistics recalculated, but nothing changed.  I know I can rewrite the query, but I'm not looking for a work-aroud: I wonder if it's a known bug of CBO and if someone else experimented the same behavior (and also if a patch is available).     Thanks in advance 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
        

