Support Questions
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

## How to generate index with Pig ?

Contributor

I have a telematic data like this:

X         Y

0.1      0.2

0.3      0.1

....       ....

I need to calculate the distance and add it to the third row call Dis:

X        Y        Dis

0.1    0.2      0

0.3    0.1      0.22

....     .....      ....

How can i calculate this on Pig, i think i need to generate an index column, but still don't know how to do it ?

And is there a way that doesn't require generate index column ???

Many thanks !

4 REPLIES 4

## Re: How to generate index with Pig ?

Contributor
If I understand your question correctly, you have a set of ordered points and you want to calculate the distance between each point and the previous point and store that distance into a new field.

In SQL, you'd use a window function to handle this kind of query.

Pig has an Over function and a lag function which you should be able to use. Here's the docs for Over:

http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/evaluation/Over.html

Let me know if that has enough details to get you on your way.

## Re: How to generate index with Pig ?

Contributor

Thanks joey.

This look like the answer for me, but for the example in the link you give :

```A = load 'T';
B = group A by si
C = foreach B {
C1 = order A by d;
generate flatten(Stitch(C1, Over(C1.f, 'sum(float)')));
}
D = foreach C generate s, \$9;```

, i don't quite understand: i can't use GROUP BY and ORDER in this telematic data. So i try to run some exsample myselft: I just want to DUMP row x and another row with value is above x 1 line: EX:

X: 2,  1,  3,  4 ...

B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
C = FOREACH B
GENERATE x , Over(x, lead, -1, 0, 1, 0);
DUMP C;

and got this error:

<file script.pig, line 16, column 13> Failed to generate logical plan. Nested exception:

org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports:

[, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Can you give me some more specific advise, I just need to

## Re: How to generate index with Pig ?

Contributor

Sorry about the post above this post, this is a complete one:

Thanks joey,

This look like the answer for me, but for the example in the link you give :

```A = load 'T';
B = group A by si
C = foreach B {
C1 = order A by d;
generate flatten(Stitch(C1, Over(C1.f, 'sum(float)')));
}
D = foreach C generate s, \$9;```

I don't quite understand: i can't use GROUP BY and ORDER in this telematic data. So i try to run some exsample myselft: I just want to DUMP row x and another row with value is above x 1 line: EX:

X: 2,  1,  3,  4 ... ,5 , 6.

X': 0, 2, 1, 3, ...., 5 .

B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
C = FOREACH B
GENERATE x , Over(x, lead, -1, 0, 1, 0);
DUMP C;

and got this error:

<file script.pig, line 16, column 13> Failed to generate logical plan. Nested exception:

org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports:

[, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
C = FOREACH B
GENERATE x , Over(B.x, lead, -1, 0, 1, 0);
DUMP C;

But still got the same error.

Can you give me some more specific advise.

Thanks !

## Re: How to generate index with Pig ?

Contributor

Can someone help, i've been stuck here for a week ?

Or you can just give me a link to a pig forum ? ...

Don't have an account?
Coming from Hortonworks? Activate your account here