Member since
08-18-2017
129
Posts
13
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1770 | 09-13-2022 10:50 AM | |
858 | 07-25-2022 12:18 AM | |
3335 | 06-24-2019 01:56 PM | |
1229 | 10-13-2018 04:40 AM | |
1100 | 10-13-2018 04:24 AM |
09-26-2022
03:54 PM
@Griggsy Hope above suggestion helped. Let me know if otherwise.
... View more
09-22-2022
06:24 PM
1 Kudo
@Griggsy Partition discovery & retention is only for external table & data purge is disabled by default for external tables. ie., If tblproperties has ('discover.partitions'='true' & 'partition.retention.period'='2m' ) without 'external.table.purge'='true'. Consecutive iterations will repeatedly add/remove partition till underlying HDFS partition folder is removed. I suspect you are seeing the same situation. eg., ==> Expired partition is removed from Metadata in Iteration 1 <==
2022-09-23T02:43:36,786 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true)
2022-09-23T02:43:36,822 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse]
==> As HDFS partition folder exist, discover.partition added it back in Iteration 2 <==
2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.HiveMetaStoreChecker (HiveMetaStoreChecker.java:checkMetastore(144)) - Number of partitionsNotInMs=[test:dept=cse], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
2022-09-23T02:48:36,818 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 1 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true)
==> Expired partition is removed from Metadata in Iteration 3 <==
2022-09-23T02:53:36,805 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(135)) - hive.default.test - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 1 lockRequired: true (R: true A: true true)
2022-09-23T02:53:36,837 INFO [PartitionDiscoveryTask-1]: metastore.Msck (Msck.java:repair(287)) - Expired partitions: [test:dept=cse] Please use following 3 tblproperties accordingly to make it work as you expect. 1) 'discover.partitions'='true' -- To discover partitions from HDFS path or remove partition metadata from HMS when HDFS path is removed. 2) 'partition.retention.period'='<retention_period>' -- To remove partitions after retention period 3) 'external.table.purge'='true' -- To remove HDFS partition folder, so that partition discover won't add it back to HMS. PS: To enable global external table purge, set hive.external.table.purge.default=true cluster wide in ambari -> hive -> custom hive-site
... View more
09-13-2022
10:50 AM
1 Kudo
Can you let me know your cluster version ? Please check AutoPartitionDiscovery thread (PartitionManagementTask) is running in HMS logs. If it is running, you should be seeing following log line in HMS. Make sure you don't have specific db/table pattern. HMS
org.apache.hadoop.hive.metastore.PartitionManagementTask: [Metastore Scheduled Worker 3]: Looking for tables using catalog: hive dbPattern: * tablePattern: * found: 107
HS2/HMS
org.apache.hadoop.hive.metastore.HiveMetaStoreChecker: [HiveServer2-Background-Pool: Thread-123]: Number of partitionsNotInMs=[], partitionsNotOnFs=[], tablesNotInMs=[], tablesNotOnFs=[], expiredPartitions=[]
org.apache.hadoop.hive.metastore.Msck: [HiveServer2-Background-Pool: Thread-123]: hive.default.test_table - #partsNotInMs: 0 #partsNotInFs: 0 #expiredPartitions: 0 lockRequired: true (R: true A: true true) In CDP cluster managed by CM, set metastore.msck.repair.enable.partition.retention=true at Hive -> Configuration -> Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml In HDP-3.x managed by Ambari, set metastore.msck.repair.enable.partition.retention=true in Hive -> custom hms-site Restart applicable services. expiredPartitions should be visible in logs & should be removed by HMS at scheduled interval once they become eligible.
... View more
09-08-2022
08:03 PM
Partition Retention is disabled by default in HDP-3.1.4.x set metastore.msck.repair.enable.partition.retention=true in ambari -> Hive -> custom hms-site & restart HMS. HMS should automatically take care.
... View more
08-08-2022
10:45 PM
I am able to run the above SQL in CDP 7.1.7 SP1 without any issue. Please try it in CDP.
... View more
08-08-2022
05:51 PM
Both the queries should work fine in CDP-7.x Hive Can you share the exception trace if you face any issue ?
... View more
07-25-2022
07:52 AM
AppId should be logged in client console/logs, please see the following example: INFO : Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
...
INFO : Status: Running (Executing on YARN cluster with App id application_1658422185646_0004) Otherwise HS2 logs should help to find the AppId. Following the execution thread of the queryId to find the corresponding AppId, please see following example. 2022-07-25 07:10:32,112 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-123456]: Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
2022-07-25 07:10:32,196 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-123456]: Submitting dag to TezSession, sessionName=HIVE-xxxx, applicationId=application_1658422185646_0004, dagName=<<query>>) (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de }
... View more
07-25-2022
12:18 AM
@Hafiz I am able to achieve it with posexplode, please find my solution below. create table comm_article(id int, key string, itemno string, value string);
insert into comm_article values(1, 'A', '1|2|3', '10|20|30'), (1, 'B', '1|2|3', '40|50|60'), (1, 'C', '1|2|3', '70|80|90');
select * from comm_article;
+------------------+-------------------+----------------------+---------------------+
| comm_article.id | comm_article.key | comm_article.itemno | comm_article.value |
+------------------+-------------------+----------------------+---------------------+
| 1 | A | 1|2|3 | 10|20|30 |
| 1 | B | 1|2|3 | 40|50|60 |
| 1 | C | 1|2|3 | 70|80|90 |
+------------------+-------------------+----------------------+---------------------+
with t as (select id, key, split(itemno, '[|]') as itemno_arr, split(value, '[|]') as value_arr from comm_article)
select id, key, itemno, value from t
lateral view posexplode(itemno_arr) myTable1 AS myCol1, itemno
lateral view posexplode(value_arr) myTable2 AS myCol2, value
where myCol1 = myCol2;
+-----+------+---------+--------+
| id | key | itemno | value |
+-----+------+---------+--------+
| 1 | A | 1 | 10 |
| 1 | A | 2 | 20 |
| 1 | A | 3 | 30 |
| 1 | B | 1 | 40 |
| 1 | B | 2 | 50 |
| 1 | B | 3 | 60 |
| 1 | C | 1 | 70 |
| 1 | C | 2 | 80 |
| 1 | C | 3 | 90 |
+-----+------+---------+--------+ If this helps to address your ask, please accept the solution.
... View more
07-24-2022
11:41 PM
@dmharshit Yarn queue where the Tez application gets submitted don't have enough capacity to start a new application. You should be able to see tez application in yarn rm -> accepted applications. Tune NM capacity + reduce following configs to let yarn accept & launch new application. -- Setting AM size to 512mb set tez.am.resource.memory.mb=512; -- Setting Task container size to 512mb set hive.tez.container.size=512;
... View more
01-29-2021
03:01 PM
I suspect the cause for "alter table" slowness is same as described here HIVE-23959 & HIVE-23806 might help your situation.
... View more