Created on 02-15-2017 09:11 AM - edited 08-19-2019 04:51 AM
I am new to pig any input is really appreciated.plays relation does not contain field ninetieth.How can we use out_max.ninetieth in step 5?
Created 02-15-2017 12:42 PM
Nineteith is an alias you create in step 4, so out_max.Ninetieth is a way to access it. Datafu comes standard as part of HDP and is at version 1.3, please do not use 0.4 version, you can find ours in /use/HDP/current/pig-client/lib.
Created 02-15-2017 01:04 PM
Thanks @Artem Ervits plays relation does not contain alias Nineteith and I can understand it is generated in step 4.How can we use Nineteith in step 5 since plays does not contain Nineteith alias
trim_outliers =foreach plays generate (Here we need to select any alias from plays)
can I select alias from any relation while using foreach generate statement
Created 02-15-2017 01:37 PM
That's why you reference the relation by out_max.ninetieth it is from a separate relation than play
Created 02-15-2017 03:05 PM
From Pig Textbook:-
Tuple:-
A = load 'input' as (t:tuple(x:int, y:int)); B = foreach A generate t.x, t.$1;
Bag:-
when you project fields in a bag, you are creating a new bag with only those fields: A = load 'input' as (b:bag{t:(x:int, y:int)}); B = foreach A generate b.x; This will produce a new bag whose tuples have only the field x in them
How to get more information about this i mean reference a relation by using dot operator(out_max.ninetieth). I do not find anything from pig manual and any input's on this.
Created 02-15-2017 03:12 PM
@vamsi valiveti you're overthinking it, it is just an alias. Here's more https://pig.apache.org/docs/r0.16.0/basic.html
You can assign an alias to another alias. The new alias can be used in the place of the original alias to refer the original relation.
As noted, the fields in a tuple can be any data type, including the complex data types: bags, tuples, and maps.
In this example the data file contains tuples. A schema for complex data types (in this case, tuples) is used to load the data. Then, dereference operators (the dot in t1.t1a and t2.$0) are used to access the fields in the tuples. Note that when you assign names to fields you can still refer to these fields using positional notation.
cat data; (3,8,9) (4,5,6) (1,4,7) (3,7,5) (2,5,8) (9,5,8) A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int)); DUMP A; ((3,8,9),(4,5,6)) ((1,4,7),(3,7,5)) ((2,5,8),(9,5,8)) X = FOREACH A GENERATE t1.t1a,t2.$0; DUMP X; (3,4) (1,3) (2,9)