About nramanaiah

nramanaiah · ‎05-09-2024

Hive can't understand (20) as negative value. You should explicitly mention -20 in column value. ie., +----------------+---------------+---------------+---------------+---------------+ | test.currency | test.spend_a | test.spend_b | test.spend_c | test.spend_d | +----------------+---------------+---------------+---------------+---------------+ | GBP | 50 | 25 | -20 | 5 | | JPY | 50 | 25 | -20 | 5 | | AUD | 50 | 25 | -20 | 5 | +----------------+---------------+---------------+---------------+---------------+ select Currency, (spend_a + spend_b + spend_c + spend_d) as total_spend from test; +-----------+--------------+ | currency | total_spend | +-----------+--------------+ | GBP | 60.0 | | JPY | 60.0 | | AUD | 60.0 | +-----------+--------------+

nramanaiah · ‎04-26-2024

I dint see the full query in your description. Are you aggregating (SUM) delta column based on customer_id ? I did the following simple repro without aggregates & able to get the results as expected. create external table Customer(customer_id string, money_spent bigint) stored as textfile; insert into customer values("0001", 500), ("0002", 250), ("0003", 100); create external table sales(customer_id string, order_id string, amount bigint) stored as textfile; insert into sales values("0001", "5005", 500), ("0002", "5006", 250), ("0003", "5007", 50); select "Sales Check" as rule, (a.money_spent - b.amount) as delta from customer a join sales b on a.customer_id = b.customer_id; +--------------+--------+ | rule | delta | +--------------+--------+ | Sales Check | 0 | | Sales Check | 0 | | Sales Check | 50 | +--------------+--------+ 3 rows selected (15.186 seconds) Let me know if this helps.

nramanaiah · ‎04-25-2024

If you want to join the tables, you should have primary-foreign key relation. ie., sales table should have customer_id column or customer table should have order_id column to map rows between 2 tables. Otherwise this is not achievable.

nramanaiah · ‎04-23-2024

If its a tez application, AM logs will show how much memory is currently allocated/consumed by the application & how much free resources available in the queue at that specific time. eg., 2024-04-22 23:27:20,636 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:843776, vCores:206> Free: <memory:2048, vCores:306> pendingRequests: 0 delayedContainers: 205 heartbeats: 101 lastPreemptionHeartbeat: 100 2024-04-22 23:27:30,660 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:155648, vCores:38> Free: <memory:495616, vCores:356> pendingRequests: 0 delayedContainers: 38 heartbeats: 151 lastPreemptionHeartbeat: 150 This allocation details will be logged frequently in Tez AM logs.

nramanaiah · ‎04-15-2024

Existing files won't be rewritten by delete query, instead deleted rows ROW__ID will be written in new delete_delta folder. Read queries will apply deleted ROW__ID on existing files to exclude the rows. Triggering Major compaction on the table will rewrite new files merging delta & delete_delta folder.

nramanaiah · ‎09-26-2022

@Griggsy Hope above suggestion helped. Let me know if otherwise.

nramanaiah · ‎09-22-2022

@Griggsy Partition discovery & retention is only for external table & data purge is disabled by default for external tables. ie., If tblproperties has ('discover.partitions'='true' & 'partition.retention.period'='2m' ) without 'external.table.purge'='true'. Consecutive iterations will repeatedly add/remove partition till underlying HDFS partition folder is removed. I suspect you are seeing the same situation. eg., ==> Expired partition is removed from Metadata in Iteration 1 <== 2022-09-23T02:43:36,786 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true) 2022-09-23T02:43:36,822 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse] ==> As HDFS partition folder exist, discover.partition added it back in Iteration 2 <== 2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.HiveMetaStoreChecker (HiveMetaStoreChecker.java:checkMetastore(144)) - Number of partitionsNotInMs=[test:dept=cse], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[] 2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 1 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true) ==> Expired partition is removed from Metadata in Iteration 3 <== 2022-09-23T02:53:36,805 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true) 2022-09-23T02:53:36,837 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse] Please use following 3 tblproperties accordingly to make it work as you expect. 1) 'discover.partitions'='true' -- To discover partitions from HDFS path or remove partition metadata from HMS when HDFS path is removed. 2) 'partition.retention.period'='<retention_period>' -- To remove partitions after retention period 3) 'external.table.purge'='true' -- To remove HDFS partition folder, so that partition discover won't add it back to HMS. PS: To enable global external table purge, set hive.external.table.purge.default=true cluster wide in ambari -> hive -> custom hive-site

nramanaiah · ‎09-13-2022

Can you let me know your cluster version ? Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs. If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern. HMS org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107 HS2/HMS org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[] org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true) In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible.

nramanaiah · ‎09-08-2022

Partition Retention is disabled by default in HDP-3.1.4.x set metastore.msck.repair.enable.partition.retention=true in ambari -> Hive -> custom hms-site & restart HMS. HMS should automatically take care.

nramanaiah · ‎08-08-2022

I am able to run the above SQL in CDP 7.1.7 SP1 without any issue. Please try it in CDP.

Online	Offline
Last Visited	‎11-22-2024 12:30 PM

Member Since	‎08-18-2017 06:55 AM
Last Visited	‎11-22-2024 12:30 PM
Posts	145
Kudos received	19

Cloudera Community

Re: Hive query to check mathematical values from m...

Re: Partition Retention Period not working on Hive...

Re: Parsing multiple pipe delimited columns into r...

Re: Hive CLI session warnings

Re: question about clustered only table

Re: Hive query to check mathematical values from m...

Re: Hive query to check mathematical values from m...

Re: Hive query to check mathematical values from m...

Re: Can we see if the application logs is having r...

Re: DELETE rows in table, how HDFS file size is im...

Re: Partition Retention Period not working on Hive...

Re: Partition Retention Period not working on Hive...

Re: Partition Retention Period not working on Hive...

Re: Partition Retention Period not working on Hive...

Re: How can I solve this sql problem? (like join i...