About nramanaiah

nramanaiah · ‎08-08-2022

Both the queries should work fine in CDP-7.x Hive Can you share the exception trace if you face any issue ?

nramanaiah · ‎07-25-2022

AppId should be logged in client console/logs, please see the following example: INFO : Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>> ... INFO : Status: Running (Executing on YARN cluster with App id application_1658422185646_0004) Otherwise HS2 logs should help to find the AppId. Following the execution thread of the queryId to find the corresponding AppId, please see following example. 2022-07-25 07:10:32,112 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-123456]: Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>> 2022-07-25 07:10:32,196 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-123456]: Submitting dag to TezSession, sessionName=HIVE-xxxx, applicationId=application_1658422185646_0004, dagName=<<query>>) (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de }

nramanaiah · ‎07-25-2022

@Hafiz I am able to achieve it with posexplode, please find my solution below. create table comm_article(id int, key string, itemno string, value string); insert into comm_article values(1, 'A', '1|2|3', '10|20|30'), (1, 'B', '1|2|3', '40|50|60'), (1, 'C', '1|2|3', '70|80|90'); select * from comm_article; +------------------+-------------------+----------------------+---------------------+ | comm_article.id | comm_article.key | comm_article.itemno | comm_article.value | +------------------+-------------------+----------------------+---------------------+ | 1 | A | 1|2|3 | 10|20|30 | | 1 | B | 1|2|3 | 40|50|60 | | 1 | C | 1|2|3 | 70|80|90 | +------------------+-------------------+----------------------+---------------------+ with t as (select id, key, split(itemno, '[|]') as itemno_arr, split(value, '[|]') as value_arr from comm_article) select id, key, itemno, value from t lateral view posexplode(itemno_arr) myTable1 AS myCol1, itemno lateral view posexplode(value_arr) myTable2 AS myCol2, value where myCol1 = myCol2; +-----+------+---------+--------+ | id | key | itemno | value | +-----+------+---------+--------+ | 1 | A | 1 | 10 | | 1 | A | 2 | 20 | | 1 | A | 3 | 30 | | 1 | B | 1 | 40 | | 1 | B | 2 | 50 | | 1 | B | 3 | 60 | | 1 | C | 1 | 70 | | 1 | C | 2 | 80 | | 1 | C | 3 | 90 | +-----+------+---------+--------+ If this helps to address your ask, please accept the solution.

nramanaiah · ‎07-24-2022

@dmharshit Yarn queue where the Tez application gets submitted don't have enough capacity to start a new application. You should be able to see tez application in yarn rm -> accepted applications. Tune NM capacity + reduce following configs to let yarn accept & launch new application. -- Setting AM size to 512mb set tez.am.resource.memory.mb=512; -- Setting Task container size to 512mb set hive.tez.container.size=512;

nramanaiah · ‎01-29-2021

I suspect the cause for "alter table" slowness is same as described here HIVE-23959 & HIVE-23806 might help your situation.

nramanaiah · ‎06-24-2019

This WARN logs should not cause any issue, 1) however if you want to remove this configs, you can use below syntax to delete the config on specific component config_type(Configs can be searched in ambari -> hive configs filter box to know which file to be updated) /var/lib/ambari-server/resources/scripts/configs.py -u <<username>> -p <<password>> -n <<clustername>> -l <<ambari-server-host>> -t <<ambari-server-port>> -a <<action>> -c <<config_type>> -k <<config-key>> eg., /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.strict /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.supports.subdirectories This is the reference for configs.py https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations#Modifyconfigurations-Editconfigurationusingconfigs.py 2) To remove log4j warning, goto ambari -> hive configs -> advance hive-log4j, comment below line log4j.appender.DRFA.MaxFileSize After the above 2 changes, restart hive services, all those 3 warns should go away. If this article helps to resolve the issue, accept the answer, it might also help others members in the community.

nramanaiah · ‎01-14-2019

If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it. It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.

nramanaiah · ‎01-11-2019

You can use below sql to convert the content as required 0: jdbc:hive2://localhost> select * from temptable; +---------------------+--+ | temptable.fullname | +---------------------+--+ | abc xxx! def ghi | | abc systems! | +---------------------+--+ 0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from ( 0: jdbc:hive2://localhost> select case 0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '') 0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t; +-------+--------+--------------+------+--+ | last | title | first | mid | +-------+--------+--------------+------+--+ | abc | xxx | def | ghi | | | | abc systems | | +-------+--------+--------------+------+--+ If my answer helped to resolve the issue, please accept. It might help others in the community.

nramanaiah · ‎10-15-2018

Bucketed tables allows much more efficient sampling than non-bucketed tables. With sampling, we can try out queries on a section of data for testing and debugging purpose when the original data sets are very huge. False positive is possible on clustered by columns, whereas applying filters on partition column is better in pruning the exact files. Bucketing tables also helps in map-join, assuming the data is equally distributed. As it eliminates the cost of partitioning the data while joining. Bucketing is useful with high cardinality columns, as it reduces number of small files & clusters them within specified buckets. Earlier to HIVE-11525, bucketing wasn't able to filter files if multiple columns are clustered by.

nramanaiah · ‎10-13-2018

In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.

Online	Offline
Last Visited	‎03-03-2025 08:13 PM

Member Since	‎08-18-2017 06:55 AM
Last Visited	‎03-03-2025 08:13 PM
Posts	146
Kudos received	19

Cloudera Community

Re: Hive query to check mathematical values from m...

Re: Partition Retention Period not working on Hive...

Re: Parsing multiple pipe delimited columns into r...

Re: Hive CLI session warnings

Re: question about clustered only table

Re: How can I solve this sql problem? (like join i...

Re: How i Will Find App Id from Hive id eg(hive_20...

Re: Parsing multiple pipe delimited columns into r...

Re: WIth HDP3.1 facing issue with hive insert stat...

Re: Super slow when describing a Hive partitioned ...

Re: Hive CLI session warnings

Re: how to split name based on special character i...

Re: how to split name based on special character i...

Re: question about clustered only table

Re: No of Reducers are not working on Hive