<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Why should we group using Apache PIG in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171860#M37443</link>
    <description>&lt;P&gt;Just top Arun A K &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Many thanks!&lt;/P&gt;</description>
    <pubDate>Thu, 11 Aug 2016 05:15:22 GMT</pubDate>
    <dc:creator>Stewart12586</dc:creator>
    <dc:date>2016-08-11T05:15:22Z</dc:date>
    <item>
      <title>Why should we group using Apache PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171857#M37440</link>
      <description>&lt;P&gt;Hi guys,

I'm very new in using Apache PIG, and I already see a lot of Scripts using Group stament without any operator (Like Sum(X), A Group by A). Why is a good alternative to use group statement?

Thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 21:13:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171857#M37440</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-08-10T21:13:35Z</dc:date>
    </item>
    <item>
      <title>Re: Why should we group using Apache PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171858#M37441</link>
      <description>&lt;P&gt;Group is used to collect data having the same key. It is not mandatory to have an aggregation to be performed along with group. &lt;/P&gt;&lt;P&gt;For a better understanding, let us consider a file with ID,Name and Age as below &lt;/P&gt;&lt;PRE&gt;1,John,23
2,James,24
3,Alice,30
4,Bob,23
5,Bill,24&lt;/PRE&gt;
If we have the below script applied on the file, loading the file and grouping it by age, we get all the data associated to one age into one single group. &lt;PRE&gt;details = LOAD 'file' USING PigStorage(',') as (id:int, name:chararray, age:int);
grouped_data = GROUP details by age;
dump grouped_data;&lt;/PRE&gt;&lt;P&gt;Output being&lt;/P&gt;&lt;PRE&gt;(23,{(1,John,23),(4,Bob,23)})
(24,{(2,James,24),(5,Bill,24)})
(30,{(3,Alice,30)})&lt;/PRE&gt;&lt;P&gt;Further more, if you describe the schema of the grouped data, you would see as below&lt;/P&gt;&lt;PRE&gt;describe grouped_data;
grouped_data: {group: int,details: {(id: int,name: chararray,age: int)}}&lt;/PRE&gt;&lt;P&gt;You can explore more &lt;A target="_blank" href="https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#GROUP"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 01:28:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171858#M37441</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-11T01:28:36Z</dc:date>
    </item>
    <item>
      <title>Re: Why should we group using Apache PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171859#M37442</link>
      <description>&lt;P&gt;++ You can group by multiple columns or even by all&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 01:30:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171859#M37442</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-11T01:30:37Z</dc:date>
    </item>
    <item>
      <title>Re: Why should we group using Apache PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171860#M37443</link>
      <description>&lt;P&gt;Just top Arun A K &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Many thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 05:15:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171860#M37443</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-08-11T05:15:22Z</dc:date>
    </item>
    <item>
      <title>Re: Why should we group using Apache PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171861#M37444</link>
      <description>&lt;P&gt;You are welcome &lt;A rel="user" href="https://community.cloudera.com/users/10082/vilaresantonio.html" nodeid="10082"&gt;@Pedro Rodgers&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2016 06:39:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Why-should-we-group-using-Apache-PIG/m-p/171861#M37444</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-08-11T06:39:27Z</dc:date>
    </item>
  </channel>
</rss>

