Member since
08-18-2017
145
Posts
19
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1579 | 05-09-2024 02:50 PM | |
5081 | 09-13-2022 10:50 AM | |
2397 | 07-25-2022 12:18 AM | |
4533 | 06-24-2019 01:56 PM | |
2104 | 10-13-2018 04:40 AM |
08-08-2022
05:51 PM
Both the queries should work fine in CDP-7.x Hive Can you share the exception trace if you face any issue ?
... View more
07-25-2022
07:52 AM
AppId should be logged in client console/logs, please see the following example: INFO : Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
...
INFO : Status: Running (Executing on YARN cluster with App id application_1658422185646_0004) Otherwise HS2 logs should help to find the AppId. Following the execution thread of the queryId to find the corresponding AppId, please see following example. 2022-07-25 07:10:32,112 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-123456]: Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
2022-07-25 07:10:32,196 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-123456]: Submitting dag to TezSession, sessionName=HIVE-xxxx, applicationId=application_1658422185646_0004, dagName=<<query>>) (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de }
... View more
07-25-2022
12:18 AM
@Hafiz I am able to achieve it with posexplode, please find my solution below. create table comm_article(id int, key string, itemno string, value string);
insert into comm_article values(1, 'A', '1|2|3', '10|20|30'), (1, 'B', '1|2|3', '40|50|60'), (1, 'C', '1|2|3', '70|80|90');
select * from comm_article;
+------------------+-------------------+----------------------+---------------------+
| comm_article.id | comm_article.key | comm_article.itemno | comm_article.value |
+------------------+-------------------+----------------------+---------------------+
| 1 | A | 1|2|3 | 10|20|30 |
| 1 | B | 1|2|3 | 40|50|60 |
| 1 | C | 1|2|3 | 70|80|90 |
+------------------+-------------------+----------------------+---------------------+
with t as (select id, key, split(itemno, '[|]') as itemno_arr, split(value, '[|]') as value_arr from comm_article)
select id, key, itemno, value from t
lateral view posexplode(itemno_arr) myTable1 AS myCol1, itemno
lateral view posexplode(value_arr) myTable2 AS myCol2, value
where myCol1 = myCol2;
+-----+------+---------+--------+
| id | key | itemno | value |
+-----+------+---------+--------+
| 1 | A | 1 | 10 |
| 1 | A | 2 | 20 |
| 1 | A | 3 | 30 |
| 1 | B | 1 | 40 |
| 1 | B | 2 | 50 |
| 1 | B | 3 | 60 |
| 1 | C | 1 | 70 |
| 1 | C | 2 | 80 |
| 1 | C | 3 | 90 |
+-----+------+---------+--------+ If this helps to address your ask, please accept the solution.
... View more
07-24-2022
11:41 PM
@dmharshit Yarn queue where the Tez application gets submitted don't have enough capacity to start a new application. You should be able to see tez application in yarn rm -> accepted applications. Tune NM capacity + reduce following configs to let yarn accept & launch new application. -- Setting AM size to 512mb set tez.am.resource.memory.mb=512; -- Setting Task container size to 512mb set hive.tez.container.size=512;
... View more
01-29-2021
03:01 PM
I suspect the cause for "alter table" slowness is same as described here HIVE-23959 & HIVE-23806 might help your situation.
... View more
06-24-2019
01:56 PM
1 Kudo
This WARN logs should not cause any issue, 1) however if you want to remove this configs, you can use below syntax to delete the config on specific component config_type(Configs can be searched in ambari -> hive configs filter box to know which file to be updated) /var/lib/ambari-server/resources/scripts/configs.py -u <<username>> -p <<password>> -n <<clustername>> -l <<ambari-server-host>> -t <<ambari-server-port>> -a <<action>> -c <<config_type>> -k <<config-key>> eg., /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.strict /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.supports.subdirectories This is the reference for configs.py https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations#Modifyconfigurations-Editconfigurationusingconfigs.py 2) To remove log4j warning, goto ambari -> hive configs -> advance hive-log4j, comment below line log4j.appender.DRFA.MaxFileSize After the above 2 changes, restart hive services, all those 3 warns should go away. If this article helps to resolve the issue, accept the answer, it might also help others members in the community.
... View more
01-14-2019
03:57 AM
If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it. It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.
... View more
01-11-2019
03:55 PM
You can use below sql to convert the content as required 0: jdbc:hive2://localhost> select * from temptable;
+---------------------+--+
| temptable.fullname |
+---------------------+--+
| abc xxx! def ghi |
| abc systems! |
+---------------------+--+
0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from (
0: jdbc:hive2://localhost> select case
0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '')
0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t;
+-------+--------+--------------+------+--+
| last | title | first | mid |
+-------+--------+--------------+------+--+
| abc | xxx | def | ghi |
| | | abc systems | |
+-------+--------+--------------+------+--+ If my answer helped to resolve the issue, please accept. It might help others in the community.
... View more
10-15-2018
06:26 AM
Bucketed tables allows much more efficient sampling than non-bucketed tables. With sampling, we can try out queries on a section of data for testing and debugging purpose when the original data sets are very huge. False positive is possible on clustered by columns, whereas applying filters on partition column is better in pruning the exact files. Bucketing tables also helps in map-join, assuming the data is equally distributed. As it eliminates the cost of partitioning the data while joining. Bucketing is useful with high cardinality columns, as it reduces number of small files & clusters them within specified buckets. Earlier to HIVE-11525, bucketing wasn't able to filter files if multiple columns are clustered by.
... View more
10-13-2018
09:00 AM
In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.
... View more