Member since
08-18-2017
72
Posts
10
Kudos Received
14
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
657 | 06-24-2019 01:56 PM | |
415 | 10-13-2018 04:40 AM | |
331 | 10-13-2018 04:24 AM | |
940 | 10-05-2018 02:59 PM | |
914 | 10-04-2018 09:06 AM |
01-29-2021
03:01 PM
I suspect the cause for "alter table" slowness is same as described here HIVE-23959 & HIVE-23806 might help your situation.
... View more
06-24-2019
01:56 PM
This WARN logs should not cause any issue, 1) however if you want to remove this configs, you can use below syntax to delete the config on specific component config_type(Configs can be searched in ambari -> hive configs filter box to know which file to be updated) /var/lib/ambari-server/resources/scripts/configs.py -u <<username>> -p <<password>> -n <<clustername>> -l <<ambari-server-host>> -t <<ambari-server-port>> -a <<action>> -c <<config_type>> -k <<config-key>> eg., /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.strict /var/lib/ambari-server/resources/scripts/configs.py -u admin -p <<dummy>> -n cluster1 -l ambari-server-host -t 8080 -a delete -c hive-site -k hive.mapred.supports.subdirectories This is the reference for configs.py https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations#Modifyconfigurations-Editconfigurationusingconfigs.py 2) To remove log4j warning, goto ambari -> hive configs -> advance hive-log4j, comment below line log4j.appender.DRFA.MaxFileSize After the above 2 changes, restart hive services, all those 3 warns should go away. If this article helps to resolve the issue, accept the answer, it might also help others members in the community.
... View more
04-22-2019
06:05 PM
You can try below query to replace empty results instead of actual value select regexp_replace(salary, ".*", "") from employee; If the intension is to mask a column, you can use ranger masking policy for the table column or use mask udf's eg., select mask(salary) from employee;
... View more
04-22-2019
04:29 PM
with t as (select *,row_number() over(partition by statename order by value desc ) as num from air_quality where measureid = 87 and reportyear >= 2008 and reportyear <= 2013)
select statename, measureid, reportyear, value from t where num=1 order by value desc limit 5
... View more
01-14-2019
03:57 AM
If the input is text data, its better to write a java utility which reads line by line, applies all the rules & re-writes it as four column as per conversion required. Then load that new file into hive to query it. It will be cumbursome to do it in sql, as it needs another case statement inside case statement to validate the length of the split.
... View more
01-11-2019
03:55 PM
You can use below sql to convert the content as required 0: jdbc:hive2://localhost> select * from temptable;
+---------------------+--+
| temptable.fullname |
+---------------------+--+
| abc xxx! def ghi |
| abc systems! |
+---------------------+--+
0: jdbc:hive2://localhost> select a[0] as LAST, substr(a[1], 0, length(a[1])-1) as TITLE, a[2] as FIRST, a[3] as MID from (
0: jdbc:hive2://localhost> select case
0: jdbc:hive2://localhost> when instr(fullname, "!" )=length(fullname) then array('', '!', substr(fullname, 0, length(fullname)-1), '')
0: jdbc:hive2://localhost> else split(fullname, " ") end as a from temptable) t;
+-------+--------+--------------+------+--+
| last | title | first | mid |
+-------+--------+--------------+------+--+
| abc | xxx | def | ghi |
| | | abc systems | |
+-------+--------+--------------+------+--+ If my answer helped to resolve the issue, please accept. It might help others in the community.
... View more
10-15-2018
06:26 AM
Bucketed tables allows much more efficient sampling than non-bucketed tables. With sampling, we can try out queries on a section of data for testing and debugging purpose when the original data sets are very huge. False positive is possible on clustered by columns, whereas applying filters on partition column is better in pruning the exact files. Bucketing tables also helps in map-join, assuming the data is equally distributed. As it eliminates the cost of partitioning the data while joining. Bucketing is useful with high cardinality columns, as it reduces number of small files & clusters them within specified buckets. Earlier to HIVE-11525, bucketing wasn't able to filter files if multiple columns are clustered by.
... View more
10-13-2018
09:00 AM
In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.
... View more
10-13-2018
04:40 AM
Assume below sample data name,age,dept 'naresh',25,'IT' 'sami',24,'ECE' Creating a table with partitioned by(name) clustered by(age,dept) into 1 bucket, 2 partition folders inside table path <table_path>/name=naresh/000000_0 <table_path>/name=sami/000000_0 Whereas incase of clustered by(name,age,dept) into 1 bucket, hashcode(name,age,dept) % Number of buckets Clustered by scatters the data within buckets based on hashcode, whereas incase of partition by, u will have explicit folder per unique value of a column which gives better performance if filters applied on partition column. If my response helped your query, accept the answer. It might help others in the community.
... View more
10-13-2018
04:24 AM
Yes, indexes automatically help in filtering data when your select query has filters on indexed column. You dont need to explicitily select from index.
... View more
10-11-2018
07:19 PM
HIVE-20593 is fixed in Hive Opensource community, I am suspecting implementing this code fix in your current cluster will help to resolve your issue. You can contact hortonworks support to validate my suspect & get hotfix.
... View more
10-10-2018
06:31 PM
Suspecting HIVE-20593 should help to fix this issue.
... View more
10-10-2018
06:16 PM
To see all values of a session in 1)HS2 beeline --outputformat=tsv2 -u '<HS2_JDBC_URL> -e 'set -v' > session.txt 2) Cli hive --outputformat=tsv2 -e 'set -v' > session.txt If specific key to be searched, then after launching beeline or cli, use the specific keys as below beeline> set hive.exec.dynamic.partition.mode; If my suggestion helped to resolve your query, accept the answer. It might help others in the community.
... View more
10-10-2018
02:57 PM
Glad to hear that the issue is resolved. You can either create local folder & give permissions or Hive will create hive.exec.local.scratchdir & provide permissions as configured in hive.scratch.dir.permission Provided if the user is having privileges on parent directory. If my answer helped to resolve the issue, accept the answer. It might help others in the community.
... View more
10-09-2018
11:08 AM
Assuming you want to connect to HS2, apart from beeline, you can connect using JDBC/ODBC as explained in wiki.
... View more
10-05-2018
02:59 PM
1 Kudo
Please try the below query, it should return required results select text from t1 join match where instr(text, hint) !=0; If my suggestion helped to solve the problem, accept the answer. It might help others in the community.
... View more
10-05-2018
02:28 PM
This issue is happening because user infa does not have privilege to create local directory in /tmp/ You can launch hive with custom local directory which might have privileges for infa user. hive --hiveconf hive.exec.local.scratchdir=<local machine path>
... View more
10-04-2018
05:24 PM
When applying order by twice, hive assumes column selection order is same on both inner & outer query which does order by on wrong column assuming wrong column datatype. You can rewrite your query having outer & inner query projection as same. SELECT t_6.color color, t_6.noq_empty noq_empty
FROM
(SELECT t_5.color_gbakc2 color,
t_5.noq_empty_gbakc3 noq_empty
FROM
(SELECT testtable.color_gbakc2 color_gbakc2,
testtable.noq_empty_gbakc3 noq_empty_gbakc3,
testtable.color_gbakc1 color_gbakc1
FROM testtable testtable
WHERE testtable.color_gbakc2 IN ('Red',
'Blue',
'Green') ) t_5
WHERE t_5.color_gbakc2 IN ('Red',
'Blue')
ORDER BY noq_empty ASC ) t_6
ORDER BY color Fortunately HIVE-6348 is a plan optimizer & its eliminating this issue also. If my answer helped you, accept the answer. It will help others in the community.
... View more
10-04-2018
11:54 AM
Statistics estimation is approximated based on available stats. In your case, i see basic stats are collected, but column stats are not collected. To estimate the number of distinct values for a column, can you run analyze table temp.test_distinct compute statistics for columns; and check explain plan.
... View more
10-04-2018
09:06 AM
HIVE-6348 should resolve this issue. I don't see any workaround for this issue. This fix is available in HDP-3
... View more
10-03-2018
08:01 AM
1 Kudo
Probably you might be hitting TEZ-2741. can you try to run the query by setting below config hive.compute.splits.in.am=false
... View more
09-28-2018
07:21 AM
If First day of week should start from Monday, change the subtraction/addition date to 1900-01-08 --First day of the week as Monday
select date_sub('2018-09-12',pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-10 |
+-------------+--+
--Last day of the week as Sunday
select date_add('2018-09-12',6 - pmod(datediff('2018-09-12','1900-01-08'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-16 |
+-------------+--+
... View more
09-27-2018
09:33 AM
Create table using appropriate delimiters as shown in below create statement # create table complextype(field1 string, field2 string, field3 struct<field31:string,field32:string>, field4 string) row format delimited fields terminated by ',' collection items terminated by '$' stored as textfile; Create CSV files using appropriate delimiters. In below sample csv data, fields are terminated by comma(,) & complex data within the field is terminated by dollar($) # hadoop fs -cat /tmp/complex.csv
value1,value2,value31$value32,value4 Load & query data # load data inpath '/tmp/complex.csv' into table complextype;
# select * from complextype;
+---------------------+---------------------+--------------------------------------------+---------------------+--+
| complextype.field1 | complextype.field2 | complextype.field3 | complextype.field4 |
+---------------------+---------------------+--------------------------------------------+---------------------+--+
| value1 | value2 | {"field31":"value31","field32":"value32"} | value4 |
+---------------------+---------------------+--------------------------------------------+---------------------+--+
If my answer helps to solve your problem, accept the answer. It might help others in the community.
... View more
09-25-2018
03:25 PM
1 Kudo
Using SQL -- First Day of the week
select date_sub('2018-09-25',pmod(datediff('2018-09-25','1900-01-07'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-23 |
+-------------+--+
-- Last Day of the week
select date_add('2018-09-25',6 - pmod(datediff('2018-09-25','1900-01-07'),7));
+-------------+--+
| _c0 |
+-------------+--+
| 2018-09-29 |
+-------------+--+ If my answer helped to solve your problem, accept the answer. It might help others in the community.
... View more
09-25-2018
11:05 AM
1 Kudo
You can write a custom UDF in Hive to pick any day of the week. You can refer lastDay UDF code as an example // Code in Custom UDF FirstDayOfWeek
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd");
Calendar calendar = Calendar.getInstance();
calendar.setTime(formatter.parse("2018-09-16")); // Actual Date string column here
calendar.set(Calendar.DAY_OF_WEEK, Calendar.SUNDAY);
System.out.println(calendar.getTime());
... View more
09-22-2018
06:30 PM
If query progress along with number of mapper & reducer task are not shown in console, please check below properties are set to true. hive.server2.in.place.progress
hive.tez.exec.inplace.progress
... View more
09-21-2018
04:29 PM
1 Kudo
Create text table with comma(,) as field delimiter create table textcomma(age int, name string) row format delimited fields terminated by ',' stored as textfile;
insert into textcomma values(1,'a'),(2,'b'),(3,'c'); Option 1 : CTAS text table with pipe(|) as field delimiter create table textpipe row format delimited fields terminated by '|' stored as textfile as select * from textcomma; # hadoop fs -cat /apps/hive/warehouse/textcomma/000000_0 1,a
2,b
3,c # hadoop fs -cat /apps/hive/warehouse/textpipe/000000_0 1|a
2|b
3|c Option 2 : Insert overwrite directory to write textfile with pipe(|) delimiter INSERT OVERWRITE DIRECTORY '/tmp/text-pipe'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE
SELECT * FROM textcomma; # hadoop fs -cat /tmp/text-pipe/000000_0 1|a
2|b
3|c If it helps to solves your query, accept the answer. It might help others.
... View more
09-19-2018
07:08 PM
Can you try adding ur custom jar @ /usr/hdp/<version>/hive/auxlib/test-1.0.0.jar path on HS2 node & restart. check hive.aux.jars.path conf is having udf jar, then rerun ur query.
... View more
09-16-2018
02:36 PM
Is it MR job ? what is the value for hive.execution.engine? Are you using specific queue to launch this job ? It seems to be resource unavailable issue. If AppMaster is launched, can you collect application_1536988895253_0006 logs & check why task container's are not getting launched yet.
... View more
09-13-2018
03:00 PM
There are multiple ways to populate avro tables. 1) Insert into avro_table values(<col1>,<col2>..,<colN>) -- This way hive will write avro files. 2) Generating avro files & copying directly to '/tmp/staging', You can read avro documentation to write avro files directly into hdfs path. Avro Reader/Writer APIs will take care of storing & retrieving records, we don't need to explicitly specify delimiters for avro files.
... View more