Reply
Contributor
Posts: 45
Registered: ‎01-24-2016

Quick speed comparison stats between hive on spark, hive mr, phoenix and impala

Hi guys

 

I just setup Phoenix 4.5.2-1.clabs_phoenix1.2.0.p0.774 through Cloudera Manager on CDH 5.6.0. My dev cluster is 

3 boxes

Each is HP 8300 8 core, 32GB RAM

1NN and 3DN

 

DDL (this table is created in Phoenix on HBase as well as in Hive) 

====

CREATE TABLE IF NOT EXISTS resume_dates (resid VARCHAR, cd VARCHAR, uts BIGINT CONSTRAINT pk PRIMARY KEY (resid));

 

Sample Data

==========

14008_1_1000522248_0_1108045212,2014-01-30,1391093927

14025_1_1010236513_0_1107883638,2014-01-30,391093930

 

 

Num of records 

============

23,748,651

 

Query

=====

select substr(cd, 1,4) as yyyy, count(resid) from RESUME_DATES group by substr(cd, 1,4) order by yyyy asc

 

Comparison of Timings

==================

Hive on MR      = 81.829 seconds

Hive on Spark  = 32.78 seconds
Phoenix            = 12.234 seconds

Impala              = 0.99 seconds

 

Thanks

 

sanjay

New Contributor
Posts: 2
Registered: ‎09-01-2017

Re: Quick speed comparison stats between hive on spark, hive mr, phoenix and impala

Migrating from Hive on MR to Hive on Spark

I'm wonder how hive + oozie action[oozie:hive2-action:0.1] on Spark[set hive.execution.engine=spark] based ran is much slower than Hive on MapReduce.
Note: I included set hive.execution.engine=spark; in my queries and in oozie included hive2-action:0.1 in [xmlns] + provided jdbc[url]. The code is running successfully, i saw logs but it takes much clock time than usual MR.