Member since
08-18-2017
82
Posts
11
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
85 | 07-25-2022 12:18 AM | |
1894 | 06-24-2019 01:56 PM | |
801 | 10-13-2018 04:40 AM | |
781 | 10-13-2018 04:24 AM | |
1670 | 10-05-2018 02:59 PM |
08-08-2022
10:45 PM
I am able to run the above SQL in CDP 7.1.7 SP1 without any issue. Please try it in CDP.
... View more
08-08-2022
05:51 PM
Both the queries should work fine in CDP-7.x Hive Can you share the exception trace if you face any issue ?
... View more
07-25-2022
07:52 AM
AppId should be logged in client console/logs, please see the following example: INFO : Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
...
INFO : Status: Running (Executing on YARN cluster with App id application_1658422185646_0004) Otherwise HS2 logs should help to find the AppId. Following the execution thread of the queryId to find the corresponding AppId, please see following example. 2022-07-25 07:10:32,112 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-123456]: Executing command(queryId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de): <<query>>
2022-07-25 07:10:32,196 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-123456]: Submitting dag to TezSession, sessionName=HIVE-xxxx, applicationId=application_1658422185646_0004, dagName=<<query>>) (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hive_20220725071031_107b4076-21e0-4da0-8cdd-d6c9e34c15de }
... View more
07-25-2022
12:18 AM
@Hafiz I am able to achieve it with posexplode, please find my solution below. create table comm_article(id int, key string, itemno string, value string);
insert into comm_article values(1, 'A', '1|2|3', '10|20|30'), (1, 'B', '1|2|3', '40|50|60'), (1, 'C', '1|2|3', '70|80|90');
select * from comm_article;
+------------------+-------------------+----------------------+---------------------+
| comm_article.id | comm_article.key | comm_article.itemno | comm_article.value |
+------------------+-------------------+----------------------+---------------------+
| 1 | A | 1|2|3 | 10|20|30 |
| 1 | B | 1|2|3 | 40|50|60 |
| 1 | C | 1|2|3 | 70|80|90 |
+------------------+-------------------+----------------------+---------------------+
with t as (select id, key, split(itemno, '[|]') as itemno_arr, split(value, '[|]') as value_arr from comm_article)
select id, key, itemno, value from t
lateral view posexplode(itemno_arr) myTable1 AS myCol1, itemno
lateral view posexplode(value_arr) myTable2 AS myCol2, value
where myCol1 = myCol2;
+-----+------+---------+--------+
| id | key | itemno | value |
+-----+------+---------+--------+
| 1 | A | 1 | 10 |
| 1 | A | 2 | 20 |
| 1 | A | 3 | 30 |
| 1 | B | 1 | 40 |
| 1 | B | 2 | 50 |
| 1 | B | 3 | 60 |
| 1 | C | 1 | 70 |
| 1 | C | 2 | 80 |
| 1 | C | 3 | 90 |
+-----+------+---------+--------+ If this helps to address your ask, please accept the solution.
... View more
07-24-2022
11:41 PM
@dmharshit Yarn queue where the Tez application gets submitted don't have enough capacity to start a new application. You should be able to see tez application in yarn rm -> accepted applications. Tune NM capacity + reduce following configs to let yarn accept & launch new application. -- Setting AM size to 512mb set tez.am.resource.memory.mb=512; -- Setting Task container size to 512mb set hive.tez.container.size=512;
... View more
01-29-2021
03:01 PM
I suspect the cause for "alter table" slowness is same as described here HIVE-23959 & HIVE-23806 might help your situation.
... View more
06-24-2019
01:56 PM
1 Kudo
This WARN logs should not cause any issue, 1) however if you want to remove this configs, you can use below syntax to delete the config on specific component config_type(Configs can be searched in ambari -> hive configs filter box to know which file to be updated) /var/lib/ambari-server/resources/scripts/configs.py -u <<username>> -p <<password>> -n <<clustername>> -l <<ambari-server-host>> -t <<ambari-server-port>> -a <<action>> -c <<config_type>> -k <<config-key>> eg., /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.strict /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.supports.subdirectories This is the reference for configs.py https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations#Modifyconfigurations-Editconfigurationusingconfigs.py 2) To remove log4j warning, goto ambari -> hive configs -> advance hive-log4j, comment below line log4j.appender.DRFA.MaxFileSize After the above 2 changes, restart hive services, all those 3 warns should go away. If this article helps to resolve the issue, accept the answer, it might also help others members in the community.
... View more
04-22-2019
06:05 PM
You can try below query to replace empty results instead of actual value select regexp_replace(salary, ".*", "") from employee; If the intension is to mask a column, you can use ranger masking policy for the table column or use mask udf's eg., select mask(salary) from employee;
... View more
04-22-2019
04:29 PM
with t as (select *,row_number() over(partition by statename order by value desc ) as num from air_quality where measureid = 87 and reportyear >= 2008 and reportyear <= 2013)
select statename, measureid, reportyear, value from t where num=1 order by value desc limit 5
... View more
01-14-2019
03:57 AM
If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it. It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.
... View more
01-11-2019
03:55 PM
You can use below sql to convert the content as required 0: jdbc:hive2://localhost> select * from temptable;
+---------------------+--+
| temptable.fullname |
+---------------------+--+
| abc xxx! def ghi |
| abc systems! |
+---------------------+--+
0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from (
0: jdbc:hive2://localhost> select case
0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '')
0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t;
+-------+--------+--------------+------+--+
| last | title | first | mid |
+-------+--------+--------------+------+--+
| abc | xxx | def | ghi |
| | | abc systems | |
+-------+--------+--------------+------+--+ If my answer helped to resolve the issue, please accept. It might help others in the community.
... View more
10-15-2018
06:26 AM
Bucketed tables allows much more efficient sampling than non-bucketed tables. With sampling, we can try out queries on a section of data for testing and debugging purpose when the original data sets are very huge. False positive is possible on clustered by columns, whereas applying filters on partition column is better in pruning the exact files. Bucketing tables also helps in map-join, assuming the data is equally distributed. As it eliminates the cost of partitioning the data while joining. Bucketing is useful with high cardinality columns, as it reduces number of small files & clusters them within specified buckets. Earlier to HIVE-11525, bucketing wasn't able to filter files if multiple columns are clustered by.
... View more
10-13-2018
09:00 AM
In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.
... View more
10-13-2018
04:40 AM
Assume below sample data name,age,dept 'naresh',25,'IT' 'sami',24,'ECE' Creating a table with partitioned by(name) clustered by(age,dept) into 1 bucket, 2 partition folders inside table path <table_path>/name=naresh/000000_0 <table_path>/name=sami/000000_0 Whereas incase of clustered by(name,age,dept) into 1 bucket, hashcode(name,age,dept) % Number of buckets Clustered by scatters the data within buckets based on hashcode, whereas incase of partition by, u will have explicit folder per unique value of a column which gives better performance if filters applied on partition column. If my response helped your query, accept the answer. It might help others in the community.
... View more
10-13-2018
04:24 AM
Yes, indexes automatically help in filtering data when your select query has filters on indexed column. You dont need to explicitily select from index.
... View more
10-11-2018
07:19 PM
HIVE-20593 is fixed in Hive Opensource community, I am suspecting implementing this code fix in your current cluster will help to resolve your issue. You can contact hortonworks support to validate my suspect & get hotfix.
... View more
10-10-2018
06:31 PM
Suspecting HIVE-20593 should help to fix this issue.
... View more
10-10-2018
06:16 PM
To see all values of a session in 1)HS2 beeline --outputformat=tsv2 -u '<HS2_JDBC_URL> -e 'set -v' > session.txt 2) Cli hive --outputformat=tsv2 -e 'set -v' > session.txt If specific key to be searched, then after launching beeline or cli, use the specific keys as below beeline> set hive.exec.dynamic.partition.mode; If my suggestion helped to resolve your query, accept the answer. It might help others in the community.
... View more
10-10-2018
02:57 PM
Glad to hear that the issue is resolved. You can either create local folder & give permissions or Hive will create hive.exec.local.scratchdir & provide permissions as configured in hive.scratch.dir.permission Provided if the user is having privileges on parent directory. If my answer helped to resolve the issue, accept the answer. It might help others in the community.
... View more
10-09-2018
11:08 AM
Assuming you want to connect to HS2, apart from beeline, you can connect using JDBC/ODBC as explained in wiki.
... View more
10-05-2018
02:59 PM
1 Kudo
Please try the below query, it should return required results select text from t1 join match where instr(text, hint) !=0; If my suggestion helped to solve the problem, accept the answer. It might help others in the community.
... View more
10-05-2018
02:28 PM
This issue is happening because user infa does not have privilege to create local directory in /tmp/ You can launch hive with custom local directory which might have privileges for infa user. hive --hiveconf hive.exec.local.scratchdir=<local machine path>
... View more
10-04-2018
05:24 PM
When applying order by twice, hive assumes column selection order is same on both inner & outer query which does order by on wrong column assuming wrong column datatype. You can rewrite your query having outer & inner query projection as same. SELECT t_6.color color, t_6.noq_empty noq_empty
FROM
(SELECT t_5.color_gbakc2 color,
t_5.noq_empty_gbakc3 noq_empty
FROM
(SELECT testtable.color_gbakc2 color_gbakc2,
testtable.noq_empty_gbakc3 noq_empty_gbakc3,
testtable.color_gbakc1 color_gbakc1
FROM testtable testtable
WHERE testtable.color_gbakc2 IN ('Red',
'Blue',
'Green') ) t_5
WHERE t_5.color_gbakc2 IN ('Red',
'Blue')
ORDER BY noq_empty ASC ) t_6
ORDER BY color Fortunately HIVE-6348 is a plan optimizer & its eliminating this issue also. If my answer helped you, accept the answer. It will help others in the community.
... View more
10-04-2018
11:54 AM
Statistics estimation is approximated based on available stats. In your case, i see basic stats are collected, but column stats are not collected. To estimate the number of distinct values for a column, can you run analyze table temp.test_distinct compute statistics for columns; and check explain plan.
... View more
10-04-2018
09:06 AM
HIVE-6348 should resolve this issue. I don't see any workaround for this issue. This fix is available in HDP-3
... View more
10-03-2018
08:01 AM
1 Kudo
Probably you might be hitting TEZ-2741. can you try to run the query by setting below config hive.compute.splits.in.am=false
... View more
09-28-2018
07:21 AM
If First day of week should start from Monday, change the subtraction/addition date to 1900-01-08 --First day of the week as Monday
select date_sub('2018-09-12',pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-10 |
+-------------+--+
--Last day of the week as Sunday
select date_add('2018-09-12',6 - pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-16 |
+-------------+--+
... View more
09-27-2018
09:33 AM
Create table using appropriate delimiters as shown in below create statement # create table complextype(field1 string, field2 string, field3 struct<field31:string,field32:string>, field4 string) row format delimited fields terminated by ',' collection items terminated by '$' stored as textfile; Create CSV files using appropriate delimiters. In below sample csv data, fields are terminated by comma(,) & complex data within the field is terminated by dollar($) # hadoop fs -cat /tmp/complex.csv
value1,value2,value31$value32,value4 Load & query data # load data inpath '/tmp/complex.csv' into table complextype;
# select * from complextype;
+---------------------+---------------------+--------------------------------------------+---------------------+--+
| complextype.field1 | complextype.field2 | complextype.field3 | complextype.field4 |
+---------------------+---------------------+--------------------------------------------+---------------------+--+
| value1 | value2 | {"field31":"value31","field32":"value32"} | value4 |
+---------------------+---------------------+--------------------------------------------+---------------------+--+
If my answer helps to solve your problem, accept the answer. It might help others in the community.
... View more
09-25-2018
03:25 PM
1 Kudo
Using SQL -- First Day of the week
select date_sub('2018-09-25',pmod(datediff('2018-09-25','1900-01-07'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-23 |
+-------------+--+
-- Last Day of the week
select date_add('2018-09-25',6 - pmod(datediff('2018-09-25','1900-01-07'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-29 |
+-------------+--+ If my answer helped to solve your problem, accept the answer. It might help others in the community.
... View more
09-25-2018
11:05 AM
1 Kudo
You can write a custom UDF in Hive to pick any day of the week. You can refer lastDay UDF code as an example // Code in Custom UDF FirstDayOfWeek
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
Calendar calendar = Calendar.getInstance();
calendar.setTime(formatter.parse("2018-09-16")); // Actual Date string column here
calendar.set(Calendar.DAY_OF_WEEK, Calendar.SUNDAY);
System.out.println(calendar.getTime());
... View more