<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig latin / Order after group by in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142415#M32140</link>
    <description>&lt;P&gt;Thank you! This was exactly what I Wanted. What was new to me was this &lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;So Pig can order groups in that way.&lt;/P&gt;</description>
    <pubDate>Sat, 18 Jun 2016 02:21:16 GMT</pubDate>
    <dc:creator>petri_koski</dc:creator>
    <dc:date>2016-06-18T02:21:16Z</dc:date>
    <item>
      <title>Pig latin / Order after group by</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142413#M32138</link>
      <description>&lt;P&gt;I have trouble keeping rows in order. I have data like this:&lt;/P&gt;&lt;P&gt;cata,productx, sales,total_sales_of_category&lt;/P&gt;&lt;P&gt;food bread 112USD,1890USD&lt;/P&gt;&lt;P&gt;food breadX 98USD, 1890USD&lt;/P&gt;&lt;P&gt;Oil MotorOil 786USD,7899USD&lt;/P&gt;&lt;P&gt;Oil MotorOilY 678USD,11331USD&lt;/P&gt;&lt;P&gt;Schema is: chararray,chararray,long,long .. &lt;/P&gt;&lt;P&gt;Sorry for the lame example, but there are four colums. I can group them by category (cata) and order them by sales inside bag, BUT if I would to also order them by total_sales_of_category how am I supposed to do it .. ? Ordering inside bag works fine:&lt;/P&gt;&lt;PRE&gt;grp = group ordered by $0;
top20 = foreach grp { 
sorted = order ordered by $2 desc;
top = limit sorted 20;
generate group,FLATTEN(top);
};
 &lt;/PRE&gt;&lt;P&gt;But after this Total_sales_of_category is not in order (Of course its not ..) but if I would like to get that also in order  (total_sales_of_category) How can it be done ? Simple using x = order top 20 by $4 desc, will order rows but i will loose order of sales .. &lt;/P&gt;&lt;P&gt;Any advice would be great .. &lt;/P&gt;</description>
      <pubDate>Thu, 16 Jun 2016 21:33:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142413#M32138</guid>
      <dc:creator>petri_koski</dc:creator>
      <dc:date>2016-06-16T21:33:31Z</dc:date>
    </item>
    <item>
      <title>Re: Pig latin / Order after group by</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142414#M32139</link>
      <description>&lt;P&gt;What I think you are looking for is a list ordered by the first N categories with the highest total sales and for each of those, you want to have the first N items sub-sorted by their totals. &lt;/P&gt;&lt;P&gt;For that, I added the following testing data for myself that features 6 categories, each with 3 products within.  As with your data (and to keep the example simple) I left in the total sales per category, but if the data did not have this we could calculate it easily enough...  NOTE: your 2 rows of data for the Oil category each had a different category total -- I changed that in my test data.&lt;/P&gt;&lt;PRE&gt;[root@sandbox hcc]# cat raw_sales.txt 
CatZ,Prod22-cZ,30,60
CatA,Prod88-cA,15,50
CatY,Prod07-cY,20,40
CatB,Prod18-cB,10,50
CatX,Prod29-cZ,40,60
CatC,Prod09-cC,80,140
CatZ,Prod83-cZ,20,60
CatA,Prod17-cA,25,50
CatY,Prod98-cY,10,40
CatB,Prod99-cB,30,50
CatX,Prod19-cZ,10,60
CatC,Prod73-cC,50,140
CatZ,Prod52-cZ,10,60
CatA,Prod58-cA,15,50
CatY,Prod57-cY,10,40
CatB,Prod58-cB,10,50
CatX,Prod59-cZ,10,60
CatC,Prod59-cC,10,140&lt;/PRE&gt;&lt;P&gt;That said, the end answer should show CatC (140 total sales) with Prod09-cC &amp;amp; Prod73-cC as well as CatZ (60 total sales) with its Prod22-cZ and Prod83-cZ. &lt;/P&gt;&lt;P&gt;Here's my code.  I basically clumped up and ordered the items by category so I could throw away all but the top N ones of them first.  After that, it was basically what you had already done.&lt;/P&gt;&lt;PRE&gt;[root@sandbox hcc]# cat salesHCC.pig
rawSales = LOAD 'raw_sales.txt' USING PigStorage(',')
  AS (category: chararray, product: chararray,
      sales: long, total_sales_category: long);

-- group them by the total sales / category combos
grpByCatTotals = GROUP rawSales BY
  (total_sales_category, category);

-- put these groups in order from highest to lowest
sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC;

-- just keep the top N
topSalesCats = LIMIT sortGrpByCatTotals 2;

-- do your original logic to get the top sales within the categories
topProdsByTopCats = FOREACH topSalesCats {
  sorted = ORDER rawSales BY sales DESC;
  top = LIMIT sorted 2;
  GENERATE group, FLATTEN(top);
}
DUMP topProdsByTopCats;&lt;/PRE&gt;&lt;P&gt;The output is as initially expected.&lt;/P&gt;&lt;PRE&gt;[root@sandbox hcc]# pig -x tez salesHCC.pig 
((140,CatC),CatC,Prod09-cC,80,140)
((140,CatC),CatC,Prod73-cC,50,140)
((60,CatZ),CatZ,Prod22-cZ,30,60)
((60,CatZ),CatZ,Prod83-cZ,20,60)&lt;/PRE&gt;&lt;P&gt;I hope this was what you were looking for.  Either way, good luck!&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jun 2016 22:05:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142414#M32139</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2016-06-17T22:05:12Z</dc:date>
    </item>
    <item>
      <title>Re: Pig latin / Order after group by</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142415#M32140</link>
      <description>&lt;P&gt;Thank you! This was exactly what I Wanted. What was new to me was this &lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;So Pig can order groups in that way.&lt;/P&gt;</description>
      <pubDate>Sat, 18 Jun 2016 02:21:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-latin-Order-after-group-by/m-p/142415#M32140</guid>
      <dc:creator>petri_koski</dc:creator>
      <dc:date>2016-06-18T02:21:16Z</dc:date>
    </item>
  </channel>
</rss>

