Reply
Contributor
Posts: 45
Registered: ‎01-24-2016

Slow Hive query CDH 5.6.0

Hi guys
I run a 1NN + 4DN hadoop cluster r3.4xlarge instance each.
This hive query takes close to an hour using execution engine as MR.
Seems too long !
Unfortunately I dont have a prior version to compare my timings to.
Any help or thoughts on what I may be doing wrong 
thanks
sanjay
 
"show create table sansub01.benji_bc"
 
CREATE TABLE `sansub01.benji_bc`(
  `dt` string, 
  `uts` string, 
  `ip` string, 
  `iip` int, 
  `nid` string, 
  `ws` string, 
  `did` string, 
  `cd` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://xx-xxx-xx-xx:8020/user/hive/warehouse/sansub01.db/benji_bc'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='4', 
  'numRows'='7655945', 
  'rawDataSize'='738589053', 
  'totalSize'='746244998', 
  'transient_lastDdlTime'='1458767399')
 
 
"show create table sansub01.benji_q1"
 
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/jars/hive-common-1.1.0-cdh5.6.0.jar!/hive-log4j.properties
OK
CREATE TABLE `sansub01.benji_q1`(
  `qdt` string, 
  `qts` string, 
  `quts` string, 
  `qip` string, 
  `iqip` int, 
  `qnid` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://xx-xxx-xx-xx:8020/user/hive/warehouse/sansub01.db/benji_q1'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='3', 
  'numRows'='9844766', 
  'rawDataSize'='668089922', 
  'totalSize'='677934688', 
  'transient_lastDdlTime'='1458767286')
 
QUERY
set hive.execution.engine=mr;
use sansub01
;
drop table if exists
  benji_bc_q1
;
create table if not exists
  benji_bc_q1
AS
SELECT 
  bc.dt,
  bc.uts,
  bc.ip,
  bc.nid,
  bc.ws,
  bc.did,
  bc.cd,
  q1.qdt,
  q1.qts,
  q1.quts,
  q1.qnid
from  
 benji_q1 q1
JOIN
 benji_bc bc
ON
  q1.iqip = bc.iip
WHERE
  q1.iqip = bc.iip
 

 

Contributor
Posts: 45
Registered: ‎01-24-2016

Re: Slow Hive query CDH 5.6.0

ok guys

something is wrong and maybe there is a clue here

 

Query: select count(*) from sansub01.benji_bc_q1

+------------+

| count(*)   |

+------------+

| 2344418707 |

+------------+

Fetched 1 row(s) in 716.47s

 

The tables I INNER JOINED  have 9 million and 7 million rows....so an inner join should have less than 9 or 7 million !