Support Questions
Find answers, ask questions, and share your expertise

Pig risk factor tutorial,Pig syntax

Pig risk factor tutorial,Pig syntax

New Contributor

I have a few questions about the Pig riskfactor tutorial.

1. In the riskfactor Pig tutorial why on line 5:

e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;

in counting the sums of occurance is done by using the varible c and not d?

2. On line 8:

final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;

How are the columns $0, $1 and $3 defined? Why isn't $2 used?

3. Why isn't the STORE function with HCatStorer in the PIG helper?

4.How do you know when to add Pig arguments, for instance the -useHCatalog in this tutorial?

,

I have a few questions about the Pig riskfactor tutorial.

1. In the riskfactor Pig tutorial why on line 5:

e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;

in counting the sums of occurance is done by using the varible c and not d?

2. On line 8:

final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;

How are the columns $0, $1 and $3 defined? Why isn't $2 used?

3. Why isn't the STORE function with HCatStorer in the PIG helper?

4.How do you know when to add Pig arguments, for instance the -useHCatalog in this tutorial?

2 REPLIES 2

Re: Pig risk factor tutorial,Pig syntax

@Tuomas

1. Relation d is created as

d=group c by driverid;

To understand how d looks like, use the command

describe d

You will see a group with name c with all the columns inside it. If you understand grouping in Pig, you will get your answer :)

2. Pig use indexing for its columns. It starts from $0. So for example if I create a relation as follows

data = LOAD 'sample.txt' using PigStorage(',') as (id:int, name:chararray, age:int);

I can access these fields as

someCols = foreach data generate id,age;

Or I can use the associated index as

someCols = foreach data generate $0,$2;

Why I didn't mentioned column named name or $1? Because I don't need it.

3. I didn't get that :)

4. When you need to use a certain functionality! For example HCatLoader is used to read Hive tables in Pig and I need an API for that called

org.apache.hive.hcatalog.pig.HCatLoader()

This is not available by default and I have to use HCatLoader as an argument while initializing the Grunt shell.

Hope this helps!

Re: Pig risk factor tutorial,Pig syntax

@Tuomas

Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!