Created 05-17-2018 12:02 PM
We are using HDP 2.6. We are new to technology. We have not done any advanced performance tuning settings. We are executing below queries -
create table customer_partitioned (id int, name string, email string, state string) partitioned by (signup date) clustered by (id) into 2 buckets stored as orc tblproperties("transactional"="true"); -- 5 Seconds
Insert into table customer_partitioned PARTITION(signup) values (1,'Prathamesh','ph@gmail.com','MAH','2018-05-01');Insert into table customer_partitioned PARTITION(signup) values (1,'Anirudh','ad@gmail.com','GOA','2018-05-02');Insert into table customer_partitioned PARTITION(signup) values (1,'Sagar','ss@gmail.com','KER','2018-05-03');Insert into table customer_partitioned PARTITION(signup) values (1,'Sumeet','sp@gmail.com','PUN','2018-05-04');Insert into table customer_partitioned PARTITION(signup) values (1,'Rohit','rs@gmail.com','UP','2018-05-05');
-- It took 7 minutes to insert 5 rows.
2.
UPDATE customer_partitioned SET state = 'RAJ'WHERE customer_partitioned.name = 'Prathamesh'; -- 2 mins
3.
DELETE FROM customer_partitioned WHERE customer_partitioned.name = 'Rohit'; -- 7mins
Do we have to configure any parameters to get these run faster?
Created 05-17-2018 01:33 PM
Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster
Hope that helps
Created 05-17-2018 01:33 PM
Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster
Hope that helps
Created 05-17-2018 01:33 PM
Have a look at the various options available to speed execution of your hive queries 5 Ways to Make Your Hive Queries Run Faster
Hope that helps
Created 05-18-2018 06:18 AM
Any updates did your execution speed improve?
Created 05-18-2018 06:53 AM
@Geoffrey Shelton Okot Those parameters were already set appropriately at global HIVE config. So performance mentioned was inherently using those parameters
Created 05-18-2018 07:06 AM
Created 05-21-2018 08:36 AM
PFA HIVE config settings.
Also, for volume like 5 rows, why do we need to configure all these parameters.hive-parameters.zip
Created 05-21-2018 09:50 AM
If you are doing complex joins vectorized query execution improves the performance of operations like scans, aggregations, filters and joins. For 5 rows test the difference with and without the parameters
Most of those parameters you will need to test them to get the desired performance by timing the execution times.
Hope that helps