Created 09-27-2016 09:33 AM
Hi experts,
I want to rank my dataset but after/before I need to group my data. My dataset is:
EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
A | 2 | AA | 27-01-2016 | 3 |
A | 1 | AB | 28-01-2016 | 3 |
B | 4 | AA | 27-01-2016 | 5 |
C | 5 | AC | 27-01-2016 | 1 |
C | 2 | AC | 27-01-2016 | 4 |
Now I want to rank my data by Employee and Date and group them to obtain the sum of Value. I know that I can do this without ranking but it is a requirement the generation of the Rank by Employee and Date. Basically I want to extract the following output:
ID | EMPLOYEE | STOCK | FURNISHER | DATE | VALUE |
1 | A | 2 | AA | 27-01-2016 | 3 |
2 | A | 1 | AB | 28-01-2016 | 3 |
3 | B | 4 | AA | 27-01-2016 | 5 |
4 | C | 5 | AC | 27-01-2016 | 5 |
4 | C | 2 | AC | 27-01-2016 | 5 |
To obtain this using Apache PIG I'm using this script:
INPUT = LOAD 'FILE_PATH' USING PigStorage(';') as (Employee:Chararray, STOCK:Int, FURNICHER:Chararray, Date:Chararray, Value:Double); RANKING = rank DATA BY Employee,DATE; GRP = GROUP RANKING BY FURNISHER; DATA = FOREACH GRP_by_DATA GENERATE FLATTEN(RANKING); STORE DATA INTO 'DESTINATION_PATH' USING PigStorage(',');
But I'm not returning the desired output 😞
Anyone knows how can I do this?
Many thanks!
Created 09-27-2016 01:17 PM
This produces the results you want:
RAW = LOAD 'filepath' USING PigStorage(';') as (Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double); RANKING = rank RAW BY Employee, Date DENSE; GRP = GROUP RANKING BY $0; SUMMED = foreach GRP { summed = SUM(RANKING.Value); generate $0, summed as Ranksum; } JOINED = join RANKING by $0, SUMMED by $0; FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum; STORE FINAL INTO 'destinationpath' USING PigStorage(',');
Let me know this is what you are looking for by accepting the answer. If I did not get the requirements correct, please clarify.
Created 09-27-2016 01:17 PM
This produces the results you want:
RAW = LOAD 'filepath' USING PigStorage(';') as (Employee:Chararray, Stock:Int, Furnisher:Chararray, Date:Chararray, Value:Double); RANKING = rank RAW BY Employee, Date DENSE; GRP = GROUP RANKING BY $0; SUMMED = foreach GRP { summed = SUM(RANKING.Value); generate $0, summed as Ranksum; } JOINED = join RANKING by $0, SUMMED by $0; FINAL= foreach JOINED generate $0, Employee, Stock, Furnisher, Date, Ranksum; STORE FINAL INTO 'destinationpath' USING PigStorage(',');
Let me know this is what you are looking for by accepting the answer. If I did not get the requirements correct, please clarify.