Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Converting Lab3 PIG script to HIVE

Solved Go to solution
Highlighted

Converting Lab3 PIG script to HIVE

New Contributor

In Lab3 - using PIG to calculate risk factor, the PIG script mentioned is:

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid;
final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();


I tried doing the same using HIVE. Below is HIVE script:

CREATE TABLE riskfactor_hive STORED AS ORC AS 
select 
	t2.driverid as driverid, 
	t2.events as events, 
	dm.totmiles as totmiles, 
	dm.totmiles/t2.events as riskfactor
from
(
  select t1.driverid, count(t1.occurance) as events
	from 
	(
	  select driverid, event, 1 as occurance from geolocation
	  where event != 'normal'
	) t1
  group by t1.driverid
) t2
join drivermileage dm
on t2.driverid = dm.driverid
;

Can we further optimise the HIVE script to make it run even faster?

Much Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Converting Lab3 PIG script to HIVE

Rising Star

@Vaibhav YadavI think you can try collapsing the inner query as follows:

 select driverid, count(event)  from geolocation where event!='normal' group by driverid; 

View solution in original post

4 REPLIES 4
Highlighted

Re: Converting Lab3 PIG script to HIVE

Hi @Vaibhav Yadav, I won't comment on the SQL optimisation (perhaps someone else can), but you can do a lot of tuning to improve Hive performance and there are a lot of articles in the community to advise you. First of all, you should be using Tez as the hive execution engine instead of MapReduce and then you can do a lot of further tweaks: see for example this article which provides some advice on tuning Tez.

I'll leave the thread open for others to comment further, but this will definitely get you started!

Highlighted

Re: Converting Lab3 PIG script to HIVE

New Contributor

Thanks @Ana Gillan.. I'll try that..

Highlighted

Re: Converting Lab3 PIG script to HIVE

Rising Star

@Vaibhav YadavI think you can try collapsing the inner query as follows:

 select driverid, count(event)  from geolocation where event!='normal' group by driverid; 

View solution in original post

Re: Converting Lab3 PIG script to HIVE

New Contributor

Thanks @grajagopal. That just worked

Don't have an account?
Coming from Hortonworks? Activate your account here