Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to generate index with Pig ?

How to generate index with Pig ?

Contributor

I have a telematic data like this:

X         Y

0.1      0.2

0.3      0.1

....       ....

 

I need to calculate the distance and add it to the third row call Dis:

X        Y        Dis

0.1    0.2      0

0.3    0.1      0.22

....     .....      ....

 

How can i calculate this on Pig, i think i need to generate an index column, but still don't know how to do it ?

And is there a way that doesn't require generate index column ???

 

Many thanks ! 

4 REPLIES 4

Re: How to generate index with Pig ?

Contributor
If I understand your question correctly, you have a set of ordered points and you want to calculate the distance between each point and the previous point and store that distance into a new field.

In SQL, you'd use a window function to handle this kind of query.

Pig has an Over function and a lag function which you should be able to use. Here's the docs for Over:

http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/evaluation/Over.html

Let me know if that has enough details to get you on your way.

Re: How to generate index with Pig ?

Contributor

Thanks joey.

This look like the answer for me, but for the example in the link you give :

A = load 'T';
 B = group A by si
 C = foreach B {
     C1 = order A by d;
     generate flatten(Stitch(C1, Over(C1.f, 'sum(float)')));
 }
 D = foreach C generate s, $9;

, i don't quite understand: i can't use GROUP BY and ORDER in this telematic data. So i try to run some exsample myselft: I just want to DUMP row x and another row with value is above x 1 line: EX:

X: 2,  1,  3,  4 ...

 

  B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
  C = FOREACH B
  GENERATE x , Over(x, lead, -1, 0, 1, 0);
  DUMP C;

 

and got this error:  

<file script.pig, line 16, column 13> Failed to generate logical plan. Nested exception:

org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports:

[, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

 

I've already try this:

 

 

Can you give me some more specific advise, I just need to 

Re: How to generate index with Pig ?

Contributor

Sorry about the post above this post, this is a complete one:

Thanks joey,

 

This look like the answer for me, but for the example in the link you give :

A = load 'T';
 B = group A by si
 C = foreach B {
     C1 = order A by d;
     generate flatten(Stitch(C1, Over(C1.f, 'sum(float)')));
 }
 D = foreach C generate s, $9;

 

I don't quite understand: i can't use GROUP BY and ORDER in this telematic data. So i try to run some exsample myselft: I just want to DUMP row x and another row with value is above x 1 line: EX:

X: 2,  1,  3,  4 ... ,5 , 6.

X': 0, 2, 1, 3, ...., 5 .

 

  B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
  C = FOREACH B
  GENERATE x , Over(x, lead, -1, 0, 1, 0);
  DUMP C;

 

and got this error:  

<file script.pig, line 16, column 13> Failed to generate logical plan. Nested exception:

org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports:

[, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

 

I've already try this:

 B = LOAD '/user/hue/pig/drivers/drivers/1002/1.csv' using PigStorage(',') AS (x: float, y: float);
 C = FOREACH B
 GENERATE x , Over(B.x, lead, -1, 0, 1, 0);
 DUMP C;

 But still got the same error.

 

Can you give me some more specific advise.

 

Thanks !

Re: How to generate index with Pig ?

Contributor

Can someone help, i've been stuck here for a week ?

Or you can just give me a link to a pig forum ? ...