Support Questions
Find answers, ask questions, and share your expertise

Pig condition clarification


I am new to pig any input is really appreciated.plays relation does not contain field ninetieth.How can we use out_max.ninetieth in step 5?




Nineteith is an alias you create in step 4, so out_max.Ninetieth is a way to access it. Datafu comes standard as part of HDP and is at version 1.3, please do not use 0.4 version, you can find ours in /use/HDP/current/pig-client/lib.


Thanks @Artem Ervits plays relation does not contain alias Nineteith and I can understand it is generated in step 4.How can we use Nineteith in step 5 since plays does not contain Nineteith alias

trim_outliers =foreach plays generate (Here we need to select  any alias from plays)

can I select alias from any relation while using foreach generate statement


That's why you reference the relation by out_max.ninetieth it is from a separate relation than play


From Pig Textbook:-


A = load 'input' as (t:tuple(x:int, y:int));
B = foreach A generate t.x, t.$1;


when you project fields in a bag, you are creating a new bag with only those fields:
A = load 'input' as (b:bag{t:(x:int, y:int)});
B = foreach A generate b.x;
This will produce a new bag whose tuples have only the field x in them

How to get more information about this i mean reference a relation by using dot operator(out_max.ninetieth). I do not find anything from pig manual and any input's on this.


@vamsi valiveti you're overthinking it, it is just an alias. Here's more

You can assign an alias to another alias. The new alias can be used in the place of the original alias to refer the original relation.

Referencing Fields that are Complex Data Types

As noted, the fields in a tuple can be any data type, including the complex data types: bags, tuples, and maps.

  • Use the schemas for complex data types to name fields that are complex data types.
  • Use the dereference operators to reference and work with fields that are complex data types.

In this example the data file contains tuples. A schema for complex data types (in this case, tuples) is used to load the data. Then, dereference operators (the dot in t1.t1a and t2.$0) are used to access the fields in the tuples. Note that when you assign names to fields you can still refer to these fields using positional notation.

cat data;
(3,8,9) (4,5,6)
(1,4,7) (3,7,5)
(2,5,8) (9,5,8)

A = LOAD 'data' AS (t1:tuple(t1a:int, t1b:int,t1c:int),t2:tuple(t2a:int,t2b:int,t2c:int));


X = FOREACH A GENERATE t1.t1a,t2.$0;

; ;