Created 11-28-2016 01:24 PM
I am new to pig and any input is really appreciated
source file:
Exchange,Symbol,date,open,high,low,close,volume,adj_close NASDAQ,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21 NASDAQ,JDAS,2010-01-28,29.86,27.97,26.84,26.88,1272600,26.88 NASDAQ,JDAS,2010-01-27,27.48,27.93,27.20,27.68,560100,27.68 ICICI,JDAS,2010-02-08,25.41,26.59,25.15,26.46,488900,26.46 ICICI,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21 ICICI,JDAS,2010-01-28,27.86,27.97,26.84,26.88,1272600,26.88 NASDAQ,JDAS,2010-01-29,26.91,27.53,26.02,26.21,883100,26.21 NASDAQ,JDAS,2010-01-28,27.86,27.97,26.84,26.88,1272600,26.88 NASDAQ,JDAS,2010-01-27,27.48,27.93,27.20,27.68,560100,27.68 NASDAQ,JDAS,2010-02-08,25.41,26.59,25.15,26.46,488900,26.46 NASDAQ,JDAS,2010-02-05,25.42,25.84,24.94,25.49,1121700,25.49 NASDAQ,JDAS,2010-02-04,26.53,26.61,25.46,25.46,574900,25.46 NASDAQ,JDAS,2009-12-31,25.97,26.13,25.47,25.47,283600,25.47 NASDAQ,JDAS,2009-12-30,25.74,26.25,25.61,26.05,236300,26.05 NASDAQ,JDAS,2009-12-29,25.98,25.98,25.52,25.76,238600,25.76 NASDAQ,JDAS,2009-11-30,23.39,23.65,22.78,23.48,522000,23.48 NASDAQ,JDAS,2009-11-27,23.12,23.71,23.10,23.54,144900,23.54 NASDAQ,JDAS,2009-11-25,23.96,24.00,23.59,23.82,220400,23.82 NASDAQ,JOEZ,2010-01-29,1.68,1.69,1.60,1.60,158900,1.60 NASDAQ,JOEZ,2010-01-28,1.64,1.70,1.61,1.62,250700,1.62 NASDAQ,JOEZ,2010-01-27,1.73,1.76,1.63,1.64,329200,1.64 NASDAQ,JOEZ,2010-01-26,1.70,1.76,1.66,1.70,509100,1.70 NASDAQ,JOEZ,2010-01-25,1.64,1.68,1.60,1.68,169600,1.68 NASDAQ,JOEZ,2010-02-08,1.80,2.04,1.76,1.93,1712200,1.93 NASDAQ,JOEZ,2010-02-05,1.84,1.88,1.70,1.80,1044700,1.80 NASDAQ,JOEZ,2010-02-04,1.96,1.97,1.74,1.88,3758600,1.88 NASDAQ,JOEZ,2010-02-03,1.73,1.79,1.68,1.72,1211700,1.72 NASDAQ,JOEZ,2010-02-02,1.59,1.72,1.51,1.70,909400,1.70 NASDAQ,JOEZ,2009-07-15,1.00,1.05,0.75,0.81,1215200,0.81 NASDAQ,JOEZ,2009-07-14,0.80,0.95,0.80,0.93,580000,0.93 NASDAQ,JOEZ,2009-07-13,0.80,0.83,0.75,0.79,148100,0.79 NASDAQ,JOEZ,2009-05-06,0.56,0.67,0.55,0.58,83800,0.58 NASDAQ,JOEZ,2009-05-05,0.63,0.63,0.58,0.58,68700,0.58 NASDAQ,JOEZ,2009-05-04,0.62,0.68,0.60,0.63,134400,0.63
x = LOAD '/home/prime23/source.txt' using PigStorage(',') As (exchange:chararray, symbol:chararray, date:chararray, open:double, high:double, low:double, close:double, volume:long, adj_close:double); query:-For each symbol get me all distinct exchanges: y = GROUP x by symbol; z1 = foreach y { t = distinct x.exchange; generate group, t; }
clarifications:
1)Here we have two symbols(JOEZ,JDAS) so nested foreach will iterate for two times.Please correct me if i am wrong?
2)How to get schema of t relation.describe is not working.
3)last statement is not clear:
y relation contains only(group,x) fields.How can we select t field which is not present in y relation.
Created 11-28-2016 03:31 PM
why not group by both symbol and exchange?
Y = Group x by symbol, exchange
z1 = foreach y {
generate group.symbol,group.exchange
Created 11-28-2016 03:55 PM
Thanks for input.see my clarifications in orginal post.I am looking for inputs on those things
Created 11-28-2016 03:59 PM
i dont understand what you are trying to do here... are you trying to get a flat pair of symbol, exchange pairs..
ex JDAS,NASDAQ
JDAS,ICICI
JOEZ,NASDAQ
you clarifications are not clear as to what you are trying to achieve.
Created 11-28-2016 04:07 PM
I want to get the distinct exchanges for each symbol.
I already have pig script for that but i have some clarifications which is mentioned in orginal post
Created 11-29-2016 09:08 AM
Hi friends,
Any input on my clarifications is appreciated since i am beginner for Hadoop and pig.
Created 11-29-2016 04:47 PM
1) iit will loop for each distinct symbol you have in the data, in this case 2 so yes 2 times.
2) not sure why the describe fails, distinct is a relational operator, i.e you normally won't be able to do a distinct (x.exchange) ... ideally to do this
exchanges = foreach y generate x.exchange;
unique_exchanges = distinct exchanges;
may be pig is doing something internally due to the curly braces.
3) generate group, t, will basically generate a cross product of group ( which is the key you used to group) and t. if you want it to generate a separate value for each exchange you can use generate group , flatten(t).
hope this helps.
Created 11-30-2016 08:51 AM
Thanks for your time. last one is not clarified.
generate group, t; is same as for each y generate group, t;
Y relation contains only two columns(group,x).How you will select t?of course t is calculated inside nested for each.
can we select any column( like t) in generate statement of nested for each?