Member since 
    
	
		
		
		08-18-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                146
            
            
                Posts
            
        
                19
            
            
                Kudos Received
            
        
                17
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3377 | 05-09-2024 02:50 PM | |
| 7350 | 09-13-2022 10:50 AM | |
| 3326 | 07-25-2022 12:18 AM | |
| 5133 | 06-24-2019 01:56 PM | |
| 2869 | 10-13-2018 04:40 AM | 
			
    
	
		
		
		05-09-2024
	
		
		02:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hive can't understand (20) as negative value. You should explicitly mention -20 in column value.  ie.,   +----------------+---------------+---------------+---------------+---------------+
| test.currency  | test.spend_a  | test.spend_b  | test.spend_c  | test.spend_d  |
+----------------+---------------+---------------+---------------+---------------+
| GBP            | 50            | 25            | -20           | 5             |
| JPY            | 50            | 25            | -20           | 5             |
| AUD            | 50            | 25            | -20           | 5             |
+----------------+---------------+---------------+---------------+---------------+
select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency  | total_spend  |
+-----------+--------------+
| GBP       | 60.0         |
| JPY       | 60.0         |
| AUD       | 60.0         |
+-----------+--------------+ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-26-2024
	
		
		08:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I dint see the full query in your description. Are you aggregating (SUM) delta column based on customer_id ?   I did the following simple repro without aggregates & able to get the results as expected.  create external table Customer(customer_id string, money_spent bigint) stored as textfile;  insert into customer values("0001", 500), ("0002", 250), ("0003", 100);    create external table sales(customer_id string, order_id string, amount bigint) stored as textfile;  insert into sales values("0001", "5005", 500), ("0002", "5006", 250), ("0003", "5007", 50);    select "Sales Check" as rule, (a.money_spent - b.amount) as delta from customer a join sales b on a.customer_id = b.customer_id;  +--------------+--------+  | rule | delta |  +--------------+--------+  | Sales Check | 0 |  | Sales Check | 0 |  | Sales Check | 50 |  +--------------+--------+  3 rows selected (15.186 seconds)  Let me know if this helps. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-25-2024
	
		
		07:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If you want to join the tables, you should have primary-foreign key relation.  ie., sales table should have customer_id column or customer table should have order_id column to map rows between 2 tables. Otherwise this is not achievable. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-23-2024
	
		
		06:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If its a tez application, AM logs will show how much memory is currently allocated/consumed by the application & how much free resources available in the queue at that specific time.  eg.,  2024-04-22 23:27:20,636 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:843776, vCores:206> Free: <memory:2048, vCores:306> pendingRequests: 0 delayedContainers: 205 heartbeats: 101 lastPreemptionHeartbeat: 100  2024-04-22 23:27:30,660 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:155648, vCores:38> Free: <memory:495616, vCores:356> pendingRequests: 0 delayedContainers: 38 heartbeats: 151 lastPreemptionHeartbeat: 150  This allocation details will be logged frequently in Tez AM logs. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-15-2024
	
		
		11:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows.  Triggering Major compaction on the table will rewrite new files merging delta & delete_delta folder. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-26-2022
	
		
		03:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Griggsy Hope above suggestion helped. Let me know if otherwise. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-22-2022
	
		
		06:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Griggsy   Partition discovery & retention is only for external table & data purge is disabled by default for external tables.  ie., If tblproperties has ('discover.partitions'='true' & 'partition.retention.period'='2m' ) without 'external.table.purge'='true'. Consecutive iterations will repeatedly add/remove partition till underlying HDFS partition folder is removed. I suspect you are seeing the same situation.  eg.,      ==> Expired partition is removed from Metadata in Iteration 1 <== 
2022-09-23T02:43:36,786 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true  true)
2022-09-23T02:43:36,822 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]
==> As HDFS partition folder exist, discover.partition added it back in Iteration 2 <==
2022-09-23T02:48:36,818 INFO  [PartitionDiscoveryTask-1]: metastore.HiveMetaStoreChecker (HiveMetaStoreChecker.java:checkMetastore(144)) - Number of partitionsNotInMs=[test:dept=cse], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
2022-09-23T02:48:36,818 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 1 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true  true)
==> Expired partition is removed from Metadata in Iteration 3 <== 
2022-09-23T02:53:36,805 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true  true)
2022-09-23T02:53:36,837 INFO  [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]     Please use following 3 tblproperties accordingly to make it work as you expect.   1) 'discover.partitions'='true' -- To discover partitions from HDFS path or remove partition metadata from HMS when HDFS path is removed.  2) 'partition.retention.period'='<retention_period>' -- To remove partitions after retention period  3) 'external.table.purge'='true' -- To remove HDFS partition folder, so that partition discover won't add it back to HMS.  PS: To enable global external table purge, set hive.external.table.purge.default=true cluster wide in ambari -> hive -> custom hive-site 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-13-2022
	
		
		10:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Can you let me know your cluster version ?  Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs.  If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern.     HMS
org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107
HS2/HMS
org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true  true)     In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml     In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site  Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-08-2022
	
		
		08:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Partition Retention is disabled by default in HDP-3.1.4.x  set metastore.msck.repair.enable.partition.retention=true in ambari -> Hive -> custom hms-site & restart HMS. HMS should automatically take care. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-08-2022
	
		
		10:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am able to run the above SQL in CDP 7.1.7 SP1 without any issue. Please try it in CDP. 
						
					
					... View more