<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Count values that are filtered - Apache PIG in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145334#M107902</link>
    <description>&lt;P&gt;I might try writing a UDF with custom counters, sounds like an interesting challenge&lt;/P&gt;</description>
    <pubDate>Sat, 10 Sep 2016 04:48:32 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-09-10T04:48:32Z</dc:date>
    <item>
      <title>Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145326#M107894</link>
      <description>Having this statement:

&lt;OL&gt;&lt;LI&gt;Values = FILTER Input_Data BY Fields &amp;gt; 0

How to cont the number of records that was filtered and not?


Many thanks!&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Sat, 10 Sep 2016 01:18:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145326#M107894</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-09-10T01:18:06Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145327#M107895</link>
      <description>&lt;P&gt;I can't think of a way to do it in one shot in Pig, if I was to write a Mapreduce job for the task, I'd implement custom counter so with every filter, custom counter gets updated &lt;A href="https://diveintodata.org/2011/03/15/an-example-of-hadoop-mapreduce-counter/" target="_blank"&gt;https://diveintodata.org/2011/03/15/an-example-of-hadoop-mapreduce-counter/&lt;/A&gt; you can also write a UDF and update custom counters, I haven't tried it but it's worth a shot  &lt;A href="http://stackoverflow.com/questions/14748120/how-to-increment-hadoop-counters-in-jython-udfs-in-pig" target="_blank"&gt;http://stackoverflow.com/questions/14748120/how-to-increment-hadoop-counters-in-jython-udfs-in-pig&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 03:29:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145327#M107895</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-09-10T03:29:20Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145328#M107896</link>
      <description>&lt;P&gt;This should work&lt;/P&gt;&lt;PRE&gt;-- split into 2 datasets
SPLIT Input_data INTO A IF Field &amp;gt; 0, B if Field &amp;lt;= 0;

-- count &amp;gt; 0 records
A_grp = GROUP A ALL;
A_count = FOREACH A_grp GENERATE COUNT(A);

-- count &amp;lt;= 0 records
B_grp = GROUP B ALL;
B_count = FOREACH B_grp GENERATE COUNT(B);&lt;/PRE&gt;&lt;P&gt;See&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT"&gt;https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A href="http://pig.apache.org/docs/r0.9.2/func.html#count"&gt;http://pig.apache.org/docs/r0.9.2/func.html#count&lt;/A&gt;  (note the use of ALL here instead of a particular field)
&lt;/LI&gt;&lt;LI&gt;&lt;A href="http://www.tutorialspoint.com/apache_pig/apache_pig_count.htm"&gt;http://www.tutorialspoint.com/apache_pig/apache_pig_count.htm&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Sat, 10 Sep 2016 03:45:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145328#M107896</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-10T03:45:05Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145329#M107897</link>
      <description>&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; what if your filter statement is a multiple of OR and AND ?&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 03:50:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145329#M107897</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-09-10T03:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145330#M107898</link>
      <description>&lt;P&gt;Good question:  you can use multiple conditions in parens.  eg&lt;/P&gt;&lt;P&gt;SPLIT A INTO X IF f1 &amp;lt; 7, Y IF f2 == 5, Z IF (f3 &amp;lt; 6 OR f5 ==0);&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 03:54:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145330#M107898</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-10T03:54:26Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145331#M107899</link>
      <description>&lt;P&gt;Not the point, you execute COUNT on each filter condition, it's not efficient but does answer his question.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 04:07:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145331#M107899</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-09-10T04:07:09Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145332#M107900</link>
      <description>&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; understood.  One of those ease of development ( a few quick pig lines) vs highly optimized (custom m-r program) questions.  Should still be relatively performant in pig.  Above code I think is the only way to do it in pig.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 04:24:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145332#M107900</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-10T04:24:47Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145333#M107901</link>
      <description>&lt;P&gt;Yup, it's a choice of coding a few lines in Pig vs spending a couple of hours with Java.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 04:45:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145333#M107901</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-09-10T04:45:56Z</dc:date>
    </item>
    <item>
      <title>Re: Count values that are filtered - Apache PIG</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145334#M107902</link>
      <description>&lt;P&gt;I might try writing a UDF with custom counters, sounds like an interesting challenge&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2016 04:48:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Count-values-that-are-filtered-Apache-PIG/m-p/145334#M107902</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-09-10T04:48:32Z</dc:date>
    </item>
  </channel>
</rss>

