Member since
08-18-2017
145
Posts
19
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1452 | 05-09-2024 02:50 PM | |
4878 | 09-13-2022 10:50 AM | |
2302 | 07-25-2022 12:18 AM | |
4447 | 06-24-2019 01:56 PM | |
2064 | 10-13-2018 04:40 AM |
05-09-2024
02:50 PM
2 Kudos
Hive can't understand (20) as negative value. You should explicitly mention -20 in column value. ie., +----------------+---------------+---------------+---------------+---------------+
| test.currency | test.spend_a | test.spend_b | test.spend_c | test.spend_d |
+----------------+---------------+---------------+---------------+---------------+
| GBP | 50 | 25 | -20 | 5 |
| JPY | 50 | 25 | -20 | 5 |
| AUD | 50 | 25 | -20 | 5 |
+----------------+---------------+---------------+---------------+---------------+
select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test;
+-----------+--------------+
| currency | total_spend |
+-----------+--------------+
| GBP | 60.0 |
| JPY | 60.0 |
| AUD | 60.0 |
+-----------+--------------+
... View more
04-26-2024
08:41 AM
1 Kudo
I dint see the full query in your description. Are you aggregating (SUM) delta column based on customer_id ? I did the following simple repro without aggregates & able to get the results as expected. create external table Customer(customer_id string, money_spent bigint) stored as textfile; insert into customer values("0001", 500), ("0002", 250), ("0003", 100); create external table sales(customer_id string, order_id string, amount bigint) stored as textfile; insert into sales values("0001", "5005", 500), ("0002", "5006", 250), ("0003", "5007", 50); select "Sales Check" as rule, (a.money_spent - b.amount) as delta from customer a join sales b on a.customer_id = b.customer_id; +--------------+--------+ | rule | delta | +--------------+--------+ | Sales Check | 0 | | Sales Check | 0 | | Sales Check | 50 | +--------------+--------+ 3 rows selected (15.186 seconds) Let me know if this helps.
... View more
04-25-2024
07:51 AM
If you want to join the tables, you should have primary-foreign key relation. ie., sales table should have customer_id column or customer table should have order_id column to map rows between 2 tables. Otherwise this is not achievable.
... View more
04-23-2024
06:59 AM
If its a tez application, AM logs will show how much memory is currently allocated/consumed by the application & how much free resources available in the queue at that specific time. eg., 2024-04-22 23:27:20,636 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:843776, vCores:206> Free: <memory:2048, vCores:306> pendingRequests: 0 delayedContainers: 205 heartbeats: 101 lastPreemptionHeartbeat: 100 2024-04-22 23:27:30,660 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:155648, vCores:38> Free: <memory:495616, vCores:356> pendingRequests: 0 delayedContainers: 38 heartbeats: 151 lastPreemptionHeartbeat: 150 This allocation details will be logged frequently in Tez AM logs.
... View more
04-15-2024
11:05 AM
2 Kudos
Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows. Triggering Major compaction on the table will rewrite new files merging delta & delete_delta folder.
... View more
09-26-2022
03:54 PM
@Griggsy Hope above suggestion helped. Let me know if otherwise.
... View more
09-22-2022
06:24 PM
2 Kudos
@Griggsy Partition discovery & retention is only for external table & data purge is disabled by default for external tables. ie., If tblproperties has ('discover.partitions'='true' & 'partition.retention.period'='2m' ) without 'external.table.purge'='true'. Consecutive iterations will repeatedly add/remove partition till underlying HDFS partition folder is removed. I suspect you are seeing the same situation. eg., ==> Expired partition is removed from Metadata in Iteration 1 <==
2022-09-23T02:43:36,786 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true)
2022-09-23T02:43:36,822 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]
==> As HDFS partition folder exist, discover.partition added it back in Iteration 2 <==
2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.HiveMetaStoreChecker (HiveMetaStoreChecker.java:checkMetastore(144)) - Number of partitionsNotInMs=[test:dept=cse], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 1 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true)
==> Expired partition is removed from Metadata in Iteration 3 <==
2022-09-23T02:53:36,805 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true)
2022-09-23T02:53:36,837 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse] Please use following 3 tblproperties accordingly to make it work as you expect. 1) 'discover.partitions'='true' -- To discover partitions from HDFS path or remove partition metadata from HMS when HDFS path is removed. 2) 'partition.retention.period'='<retention_period>' -- To remove partitions after retention period 3) 'external.table.purge'='true' -- To remove HDFS partition folder, so that partition discover won't add it back to HMS. PS: To enable global external table purge, set hive.external.table.purge.default=true cluster wide in ambari -> hive -> custom hive-site
... View more
09-13-2022
10:50 AM
1 Kudo
Can you let me know your cluster version ? Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs. If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern. HMS
org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107
HS2/HMS
org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true) In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible.
... View more
09-08-2022
08:03 PM
Partition Retention is disabled by default in HDP-3.1.4.x set metastore.msck.repair.enable.partition.retention=true in ambari -> Hive -> custom hms-site & restart HMS. HMS should automatically take care.
... View more
08-08-2022
10:45 PM
I am able to run the above SQL in CDP 7.1.7 SP1 without any issue. Please try it in CDP.
... View more