Hi experts,
I want to rank my dataset but after/before I need to group my data. My dataset is:
EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
A | 2 | AA | 27-01-2016 | 3 |
A | 1 | AB | 28-01-2016 | 3 |
B | 4 | AA | 27-01-2016 | 5 |
C | 5 | AC | 27-01-2016 | 1 |
C | 2 | AC | 27-01-2016 | 4 |
Now I want to rank my data by Employee and Date and group them to obtain the sum of Value. I know that I can do this without ranking but it is a requirement the generation of the Rank by Employee and Date. Basically I want to extract the following output:
ID | EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
1 | A | 2 | AA | 27-01-2016 | 3 |
2 | A | 1 | AB | 28-01-2016 | 3 |
3 | B | 4 | AA | 27-01-2016 | 5 |
4 | C | 5 | AC | 27-01-2016 | 5 |
4 | C | 2 | AC | 27-01-2016 | 5 |
To obtain this using Apache PIG I'm using this script:
INPUT = LOAD 'FILE_PATH' USING PigStorage(';') as
(Employee:Chararray, STOCK:Int, FURNICHER:Chararray, Date:Chararray, Value:Double);
RANKING = rank DATA BY Employee,DATE;
GRP = GROUP RANKING BY FURNISHER;
DATA = FOREACH GRP_by_DATA GENERATE FLATTEN(RANKING);
STORE DATA INTO 'DESTINATION_PATH' USING PigStorage(',');
But I'm not returning the desired output 😞
Anyone knows how can I do this?
Many thanks!