Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Performance issue with HIVE

We are using HDP 2.6. We are new to technology. We have not done any advanced performance tuning settings. We are executing below queries -

	create table customer_partitioned
	(id int, name string, email string, state string)
	partitioned by (signup date)
	clustered by (id) into 2 buckets stored as orc
	tblproperties("transactional"="true");  -- 5 Seconds 
Insert into table customer_partitioned PARTITION(signup)
values (1,'Prathamesh','ph@gmail.com','MAH','2018-05-01');Insert into table customer_partitioned PARTITION(signup)
values (1,'Anirudh','ad@gmail.com','GOA','2018-05-02');Insert into table customer_partitioned PARTITION(signup)
values (1,'Sagar','ss@gmail.com','KER','2018-05-03');Insert into table customer_partitioned PARTITION(signup)
values (1,'Sumeet','sp@gmail.com','PUN','2018-05-04');Insert into table customer_partitioned PARTITION(signup)
values (1,'Rohit','rs@gmail.com','UP','2018-05-05'); 

-- It took 7 minutes to insert 5 rows.

2.

UPDATE customer_partitioned SET state = 'RAJ'WHERE customer_partitioned.name = 'Prathamesh';  -- 2 mins

3.

 DELETE FROM customer_partitioned WHERE
customer_partitioned.name = 'Rohit';  -- 7mins

Do we have to configure any parameters to get these run faster?

8 REPLIES 8

Mentor

@Anirudh D

Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster

Hope that helps

Mentor

@Anirudh D

Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster

Hope that helps

Mentor

@Anirudh D

Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster

Hope that helps

Mentor

@Anirudh D

Any updates did your execution speed improve?

@Geoffrey Shelton Okot Those parameters were already set appropriately at global HIVE config. So performance mentioned was inherently using those parameters

Mentor

@Anirudh D

By default hive uses mr engine did you try using tez

set hive.execution.engine=tez;

@Geoffrey Shelton Okot

PFA HIVE config settings.

Also, for volume like 5 rows, why do we need to configure all these parameters.hive-parameters.zip

Mentor

@Anirudh D

If you are doing complex joins vectorized query execution improves the performance of operations like scans, aggregations, filters and joins. For 5 rows test the difference with and without the parameters

Most of those parameters you will need to test them to get the desired performance by timing the execution times.

Hope that helps

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.