Created 09-27-2016 09:33 AM
Hi experts,
I want to rank my dataset but after/before I need to group my data. My dataset is:
| EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
| A | 2 | AA | 27-01-2016 | 3 |
| A | 1 | AB | 28-01-2016 | 3 |
| B | 4 | AA | 27-01-2016 | 5 |
| C | 5 | AC | 27-01-2016 | 1 |
| C | 2 | AC | 27-01-2016 | 4 |
Now I want to rank my data by Employee and Date and group them to obtain the sum of Value. I know that I can do this without ranking but it is a requirement the generation of the Rank by Employee and Date. Basically I want to extract the following output:
| ID | EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
| 1 | A | 2 | AA | 27-01-2016 | 3 |
| 2 | A | 1 | AB | 28-01-2016 | 3 |
| 3 | B | 4 | AA | 27-01-2016 | 5 |
| 4 | C | 5 | AC | 27-01-2016 | 5 |
| 4 | C | 2 | AC | 27-01-2016 | 5 |
To obtain this using Apache PIG I'm using this script:
INPUT = LOAD 'FILE_PATH' USING PigStorage(';') as
(Employee:Chararray, STOCK:Int, FURNICHER:Chararray, Date:Chararray, Value:Double);
RANKING = rank DATA BY Employee,DATE;
GRP = GROUP RANKING BY FURNISHER;
DATA = FOREACH GRP_by_DATA GENERATE FLATTEN(RANKING);
STORE DATA INTO 'DESTINATION_PATH' USING PigStorage(','); But I'm not returning the desired output 😞
Anyone knows how can I do this?
Many thanks!
Created 09-27-2016 01:17 PM
This produces the results you want:
RAW = LOAD 'filepath' USING PigStorage(';') as
(Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double);
RANKING = rank RAW BY Employee, Date DENSE;
GRP = GROUP RANKING BY $0;
SUMMED = foreach GRP {
summed = SUM(RANKING.Value);
generate $0, summed as Ranksum;
}
JOINED = join RANKING by $0, SUMMED by $0;
FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum;
STORE FINAL INTO 'destinationpath' USING PigStorage(','); Let me know this is what you are looking for by accepting the answer. If I did not get the requirements correct, please clarify.
Created 09-27-2016 01:17 PM
This produces the results you want:
RAW = LOAD 'filepath' USING PigStorage(';') as
(Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double);
RANKING = rank RAW BY Employee, Date DENSE;
GRP = GROUP RANKING BY $0;
SUMMED = foreach GRP {
summed = SUM(RANKING.Value);
generate $0, summed as Ranksum;
}
JOINED = join RANKING by $0, SUMMED by $0;
FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum;
STORE FINAL INTO 'destinationpath' USING PigStorage(','); Let me know this is what you are looking for by accepting the answer. If I did not get the requirements correct, please clarify.