<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Invalid field projection error in Pig script in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185771#M147878</link>
    <description>&lt;P&gt;I have a &lt;STRONG&gt;dataset&lt;/STRONG&gt; as below:&lt;/P&gt;&lt;P&gt;abhi,34,brown,5&lt;/P&gt;&lt;P&gt;john,35,green,6&lt;/P&gt;&lt;P&gt;amy,30,brown,6&lt;/P&gt;&lt;P&gt;Steve,38,blue,6&lt;/P&gt;&lt;P&gt;Brett,35,brown,6&lt;/P&gt;&lt;P&gt;Andy,34,brown,6&lt;/P&gt;&lt;P&gt;Layout of above data set is, Name, age, eye color, height&lt;/P&gt;&lt;P&gt;I want to achieve a result which shows in each age group how many people r there in total, the average height of all people in each age group and how many people are with brown eyes, black eyes and blue eyes in each age group. The result should look like below&lt;/P&gt;&lt;P&gt;34, 2,5.5,2,0,0&lt;/P&gt;&lt;P&gt;35,2,6.0,1,10&lt;/P&gt;&lt;P&gt;and so on..&lt;/P&gt;&lt;P&gt;format of above result set is, &lt;/P&gt;&lt;P&gt;&amp;lt;age&amp;gt;, total no of people in that age, avg height in that age group, no of brown eyes in that age group, no of green eyes in the age group, no of blue eyes in the age group.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My script is as below:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; my_data = LOAD 'customers.txt' using PigStorage() &amp;gt;&amp;gt;   as (name:chararray, age:int, eye_color:chararray, height:int);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; my_data = FOREACH my_data&amp;gt;&amp;gt;                GENERATE name, age, height,&amp;gt;&amp;gt;                (eye_color == 'brown' ? 1 : 0) AS brown_eyes,&amp;gt;&amp;gt;                (eye_color == 'blue'  ? 1 : 0) AS blue_eyes,&amp;gt;&amp;gt;                (eye_color == 'green' ? 1 : 0 ) AS green_eyes;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; by_age = group my_data by age;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; final_data = FOREACH by_age GENERATE &amp;gt;&amp;gt;     group as age, &amp;gt;&amp;gt;     COUNT(my_data) as num_people,&amp;gt;&amp;gt;     AVG(my_data.height) as avg_height,&amp;gt;&amp;gt;     SUM(brown_eyes) as num_brown_eyes,&amp;gt;&amp;gt;     SUM(blue_eyes) as num_blue_eyes,&amp;gt;&amp;gt;     SUM(green_eyes) as num_green_eyes;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I am getting below error after the last line of the script is executed:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2018-03-14 00:44:54,181 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: &amp;lt;line 22, column 8&amp;gt; Invalid field projection. Projected field [brown_eyes] does not exist in schema: group:int,my_data:bag{:tuple(name:chararray,age:int,height:int,brown_eyes:int,blue_eyes:int,green_eyes:int)}.&lt;/P&gt;&lt;P&gt;The schema of the by_age relation clearly shows it contains the field brown_eyes but why I am still getting this error and how can I resolve it please?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 14 Mar 2018 08:19:36 GMT</pubDate>
    <dc:creator>abhijnan_kundu</dc:creator>
    <dc:date>2018-03-14T08:19:36Z</dc:date>
    <item>
      <title>Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185771#M147878</link>
      <description>&lt;P&gt;I have a &lt;STRONG&gt;dataset&lt;/STRONG&gt; as below:&lt;/P&gt;&lt;P&gt;abhi,34,brown,5&lt;/P&gt;&lt;P&gt;john,35,green,6&lt;/P&gt;&lt;P&gt;amy,30,brown,6&lt;/P&gt;&lt;P&gt;Steve,38,blue,6&lt;/P&gt;&lt;P&gt;Brett,35,brown,6&lt;/P&gt;&lt;P&gt;Andy,34,brown,6&lt;/P&gt;&lt;P&gt;Layout of above data set is, Name, age, eye color, height&lt;/P&gt;&lt;P&gt;I want to achieve a result which shows in each age group how many people r there in total, the average height of all people in each age group and how many people are with brown eyes, black eyes and blue eyes in each age group. The result should look like below&lt;/P&gt;&lt;P&gt;34, 2,5.5,2,0,0&lt;/P&gt;&lt;P&gt;35,2,6.0,1,10&lt;/P&gt;&lt;P&gt;and so on..&lt;/P&gt;&lt;P&gt;format of above result set is, &lt;/P&gt;&lt;P&gt;&amp;lt;age&amp;gt;, total no of people in that age, avg height in that age group, no of brown eyes in that age group, no of green eyes in the age group, no of blue eyes in the age group.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My script is as below:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; my_data = LOAD 'customers.txt' using PigStorage() &amp;gt;&amp;gt;   as (name:chararray, age:int, eye_color:chararray, height:int);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; my_data = FOREACH my_data&amp;gt;&amp;gt;                GENERATE name, age, height,&amp;gt;&amp;gt;                (eye_color == 'brown' ? 1 : 0) AS brown_eyes,&amp;gt;&amp;gt;                (eye_color == 'blue'  ? 1 : 0) AS blue_eyes,&amp;gt;&amp;gt;                (eye_color == 'green' ? 1 : 0 ) AS green_eyes;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; by_age = group my_data by age;&lt;/P&gt;&lt;P&gt;grunt&amp;gt; final_data = FOREACH by_age GENERATE &amp;gt;&amp;gt;     group as age, &amp;gt;&amp;gt;     COUNT(my_data) as num_people,&amp;gt;&amp;gt;     AVG(my_data.height) as avg_height,&amp;gt;&amp;gt;     SUM(brown_eyes) as num_brown_eyes,&amp;gt;&amp;gt;     SUM(blue_eyes) as num_blue_eyes,&amp;gt;&amp;gt;     SUM(green_eyes) as num_green_eyes;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I am getting below error after the last line of the script is executed:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2018-03-14 00:44:54,181 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: &amp;lt;line 22, column 8&amp;gt; Invalid field projection. Projected field [brown_eyes] does not exist in schema: group:int,my_data:bag{:tuple(name:chararray,age:int,height:int,brown_eyes:int,blue_eyes:int,green_eyes:int)}.&lt;/P&gt;&lt;P&gt;The schema of the by_age relation clearly shows it contains the field brown_eyes but why I am still getting this error and how can I resolve it please?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2018 08:19:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185771#M147878</guid>
      <dc:creator>abhijnan_kundu</dc:creator>
      <dc:date>2018-03-14T08:19:36Z</dc:date>
    </item>
    <item>
      <title>Re: Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185772#M147879</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/70304/abhijnankundu.html" nodeid="70304"&gt;@Abhijnan Kundu&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Can you please modify the script as below and try running&lt;/P&gt;&lt;P&gt;I have changed PigStorage() to PigStorage(',') and used my_data.brown_eyes instead of brown_eyes&lt;/P&gt;&lt;PRE&gt;my_data = LOAD 'customers.txt' using PigStorage(',') as (name:chararray, age:int, eye_color:chararray, height:int);
my_data = FOREACH my_data GENERATE name, age, height, (eye_color == 'brown' ? 1 : 0) AS brown_eyes, (eye_color == 'blue' ? 1 : 0) AS blue_eyes, (eye_color == 'green' ? 1 : 0 ) AS green_eyes;
by_age = group my_data by age;
final_data = FOREACH by_age GENERATE  group as age,  COUNT(my_data) as num_people, AVG(my_data.height) as avg_height, SUM(my_data.brown_eyes) as num_brown_eyes, SUM(my_data.blue_eyes) as num_blue_eyes, SUM(my_data.green_eyes) as num_green_eyes;&lt;/PRE&gt;&lt;P&gt;If this worked for you, please click on the Accept button to accept the answer. This will be helpful for other community users.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;-Aditya&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2018 13:09:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185772#M147879</guid>
      <dc:creator>asirna</dc:creator>
      <dc:date>2018-03-14T13:09:07Z</dc:date>
    </item>
    <item>
      <title>Re: Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185773#M147880</link>
      <description>&lt;P&gt;There are a couple of issues that I can see with your script.&lt;/P&gt;&lt;P&gt;Your first statement that reads the data from the file.&lt;/P&gt;&lt;PRE&gt;my_data = LOAD 'customers.txt'usingPigStorage()as(name:chararray, age:int, eye_color:chararray, height:int);&lt;/PRE&gt;&lt;P&gt;You used PigStorage() method without any parameter. If you don't pass any parameter to this method, it will consider TAB as the delimiter. And looking at your data file, you have a comma as the delimiter. So your LOAD statement should look like follows.&lt;/P&gt;&lt;PRE&gt;my_data = LOAD 'customers.txt'usingPigStorage(',')as(name:chararray, age:int, eye_color:chararray, height:int);&lt;/PRE&gt;&lt;P&gt;This actually is not the problem that you are facing though. In your last statement, where you are creating the final_data relation, you referred to your columns as &lt;/P&gt;&lt;PRE&gt;SUM(brown_eyes) as num_brown_eyes,SUM(blue_eyes) as num_blue_eyes SUM(green_eyes) as num_green_eyes&lt;/PRE&gt;&lt;P&gt;This is incorrect. A describe statement should explain the schema to you.&lt;/P&gt;&lt;P&gt;A describe statement should explain the schema for you.&lt;/P&gt;&lt;PRE&gt;grunt&amp;gt; describe by_age; 

by_age: {group: int,my_data: {(name: chararray,age: int,eye_color: chararray,height: int)}}&lt;/PRE&gt;&lt;P&gt;You can see that all the columns are clubbed inside my_data column. So the reference to these columns should be made as mentioned below.&lt;/P&gt;&lt;PRE&gt;SUM(my_data.brown_eyes) as num_brown_eyes,SUM(my_data.blue_eyes) as num_blue_eyesSUM(my_data.green_eyes) as num_green_eyes&lt;/PRE&gt;&lt;P&gt;The same way you have used my_data.height in your code.&lt;/P&gt;&lt;P&gt;So you final generate statement should look like as follows.&lt;/P&gt;&lt;PRE&gt;final_data = FOREACH by_age GENERATE groupas age, COUNT(my_data)as num_people, AVG(my_data.height)as avg_height, SUM(my_data.brown_eyes)as num_brown_eyes, SUM(my_data.blue_eyes)as num_blue_eyes, SUM(my_data.green_eyes)as num_green_eyes;&lt;/PRE&gt;&lt;P&gt;All in all, your complete script should look like as shown below.&lt;/P&gt;&lt;PRE&gt;my_data = LOAD 'customers.txt'usingPigStorage(',')as(name:chararray, age:int, eye_color:chararray, height:int);

my_data = FOREACH my_data GENERATE name, age, height,(eye_color =='brown'?1:0) AS brown_eyes,(eye_color =='blue'?1:0) AS blue_eyes,(eye_color =='green'?1:0) AS green_eyes;

by_age =group my_data by age;

final_data = FOREACH by_age GENERATE groupas age, COUNT(my_data)as num_people, AVG(my_data.height)as avg_height, SUM(my_data.brown_eyes)as num_brown_eyes, SUM(my_data.blue_eyes)as num_blue_eyes, SUM(my_data.green_eyes)as num_green_eyes;&lt;/PRE&gt;&lt;P&gt;Now you know what were the issues, you will be able to run your script and also prevent those "typos" in future!&lt;/P&gt;&lt;P&gt;Happy coding!&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2018 23:30:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185773#M147880</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-03-14T23:30:40Z</dc:date>
    </item>
    <item>
      <title>Re: Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185774#M147881</link>
      <description>&lt;P&gt;Thanks a lot Aditya and Rahul! I executed the corrected code and got desired output!&lt;/P&gt;&lt;P&gt;(30,1,6.0,1,0,0)&lt;/P&gt;&lt;P&gt;(34,2,5.5,2,0,0)&lt;/P&gt;&lt;P&gt;(35,3,5.333333333333333,1,0,2)&lt;/P&gt;&lt;P&gt;(38,1,6.0,0,1,0)&lt;/P&gt;&lt;P&gt;Thanks again for explaining the mistake I was making and correcting it!&lt;/P&gt;</description>
      <pubDate>Thu, 15 Mar 2018 03:48:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185774#M147881</guid>
      <dc:creator>abhijnan_kundu</dc:creator>
      <dc:date>2018-03-15T03:48:37Z</dc:date>
    </item>
    <item>
      <title>Re: Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185775#M147882</link>
      <description>&lt;P&gt;Thanks a lot Aditya! I executed the corrected code and got desired output!&lt;/P&gt;&lt;P&gt;(30,1,6.0,1,0,0)&lt;/P&gt;&lt;P&gt;(34,2,5.5,2,0,0)&lt;/P&gt;&lt;P&gt;(35,3,5.333333333333333,1,0,2)&lt;/P&gt;&lt;P&gt;(38,1,6.0,0,1,0)&lt;/P&gt;&lt;P&gt;Thanks again for explaining the mistake I was making and correcting it!&lt;/P&gt;</description>
      <pubDate>Thu, 15 Mar 2018 06:34:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185775#M147882</guid>
      <dc:creator>abhijnan_kundu</dc:creator>
      <dc:date>2018-03-15T06:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: Invalid field projection error in Pig script</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185776#M147883</link>
      <description>&lt;P&gt;I executed the corrected code and got desired output!&lt;BR /&gt;&lt;BR /&gt;(30,1,6.0,1,0,0)&lt;BR /&gt;&lt;BR /&gt;(34,2,5.5,2,0,0)&lt;BR /&gt;&lt;BR /&gt;(35,2,6.0,1,0,1)&lt;BR /&gt;&lt;BR /&gt;(38,1,6.0,0,1,0).&lt;BR /&gt;&lt;BR /&gt;and others showing the output is given below.&lt;BR /&gt;&lt;BR /&gt;(30,1,6.0,1,0,0)&lt;BR /&gt;&lt;BR /&gt;(34,2,5.5,2,0,0)&lt;BR /&gt;&lt;BR /&gt;(35,3,5.333333333333333,1,0,2)&lt;BR /&gt;&lt;BR /&gt;(38,1,6.0,0,1,0)&lt;BR /&gt;&lt;BR /&gt;why there is a difference in the third row? appreciated if anyone explains it.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Dec 2018 21:14:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invalid-field-projection-error-in-Pig-script/m-p/185776#M147883</guid>
      <dc:creator>dineshughade</dc:creator>
      <dc:date>2018-12-12T21:14:58Z</dc:date>
    </item>
  </channel>
</rss>

