Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Quick speed comparison stats between hive on spark, hive mr, phoenix and impala

Quick speed comparison stats between hive on spark, hive mr, phoenix and impala

Rising Star

Hi guys

 

I just setup Phoenix 4.5.2-1.clabs_phoenix1.2.0.p0.774 through Cloudera Manager on CDH 5.6.0. My dev cluster is 

3 boxes

Each is HP 8300 8 core, 32GB RAM

1NN and 3DN

 

DDL (this table is created in Phoenix on HBase as well as in Hive) 

====

CREATE TABLE IF NOT EXISTS resume_dates (resid VARCHAR, cd VARCHAR, uts BIGINT CONSTRAINT pk PRIMARY KEY (resid));

 

Sample Data

==========

14008_1_1000522248_0_1108045212,2014-01-30,1391093927

14025_1_1010236513_0_1107883638,2014-01-30,391093930

 

 

Num of records 

============

23,748,651

 

Query

=====

select substr(cd, 1,4) as yyyy, count(resid) from RESUME_DATES group by substr(cd, 1,4) order by yyyy asc

 

Comparison of Timings

==================

Hive on MR      = 81.829 seconds

Hive on Spark  = 32.78 seconds
Phoenix            = 12.234 seconds

Impala              = 0.99 seconds

 

Thanks

 

sanjay

1 REPLY 1

Re: Quick speed comparison stats between hive on spark, hive mr, phoenix and impala

New Contributor
Migrating from Hive on MR to Hive on Spark

I'm wonder how hive + oozie action[oozie:hive2-action:0.1] on Spark[set hive.execution.engine=spark] based ran is much slower than Hive on MapReduce.
Note: I included set hive.execution.engine=spark; in my queries and in oozie included hive2-action:0.1 in [xmlns] + provided jdbc[url]. The code is running successfully, i saw logs but it takes much clock time than usual MR.