Created 07-12-2019 02:21 PM
when I run a hive query
select count (*) from mytable
it takes a lot of time, for a table of 27 million line, it runs in 30 minutes, i use HDP 2.6.4 with hive 1.2.1000 and tez 0.7.0.
Best regards
Created 07-12-2019 02:29 PM
Good articles regards to tune Hive performance: Hive_performance_tune Tez_Performance_Tune . ExplainPlan
This is too broad question to answer, here are my thoughts:
1.Check is your HiveJob is getting started running in Resource manager(not in queue waiting for resources i.e Accepted state..etc)
2.Check in HDFS how many files are there in the table pointed directory, too many small files will result poor performance. and you need to consolidate all small files into big one's then run the query again.
3.Try running hive console in debug mode to see where the job is taking time to execute.
4.Check is there any skew's in the data and create table stating all these skewed columns in the table properties.
Created 07-15-2019 08:36 AM
@Shu By default, hive.exec.reducers.bytes.per.reducer is set to 64MB, as I hadoop 2.7 should I put to 128MB or put 256MB as indicated in the documentation that you had communicated to me?
Created 07-15-2019 09:25 AM
@Shu By default, hive.exec.reducers.bytes.per.reducer is set to 64MB, as I hadoop 2.7 should I put to 128MB or put 256MB as indicated in the documentation that you had communicated to me?