<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: [HIVE] is there a way to perform a &amp;quot;local limit&amp;quot; in a Hive query ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198490#M160541</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/45257/sebastienfrackowiak.html" nodeid="45257"&gt;@Sebastien F&lt;/A&gt;, are you referring to sampling data &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling&lt;/A&gt;? I think I need a little more clarification in order to better help.&lt;/P&gt;</description>
    <pubDate>Wed, 28 Mar 2018 20:49:01 GMT</pubDate>
    <dc:creator>SQLShaw</dc:creator>
    <dc:date>2018-03-28T20:49:01Z</dc:date>
    <item>
      <title>[HIVE] is there a way to perform a "local limit" in a Hive query ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198489#M160540</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I was wondering if is there a way to perform a "local limit" in a Hive query.&lt;/P&gt;&lt;P&gt;I explain :&lt;/P&gt;&lt;P&gt;Considering a query that "distribute by" a partition "X".&lt;/P&gt;&lt;P&gt;This partition contains 30 values and I want to have exactly 100 rows per value...&lt;/P&gt;&lt;P&gt;Because, when we perform "limit", generally, this one will break the sink operation at the n-th row, generally only one partition is concerned in that way... And in the aim to build some samples, I think it will be very helpful that reducers (or mappers) can be locally "limited"...&lt;/P&gt;&lt;P&gt;I hope it is clear &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks for your replies.&lt;/P&gt;&lt;P&gt;SF&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 00:47:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198489#M160540</guid>
      <dc:creator>sebastien_frack</dc:creator>
      <dc:date>2018-03-28T00:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: [HIVE] is there a way to perform a "local limit" in a Hive query ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198490#M160541</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/45257/sebastienfrackowiak.html" nodeid="45257"&gt;@Sebastien F&lt;/A&gt;, are you referring to sampling data &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling&lt;/A&gt;? I think I need a little more clarification in order to better help.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 20:49:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198490#M160541</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2018-03-28T20:49:01Z</dc:date>
    </item>
    <item>
      <title>Re: [HIVE] is there a way to perform a "local limit" in a Hive query ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198491#M160542</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/45257/sebastienfrackowiak.html" nodeid="45257"&gt;@Sebastien F&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Maybe you could try this:&lt;/P&gt;&lt;PRE&gt;with tablePart as (SELECT  ROW_NUMBER() OVER (DISTRIBUTE BY colName SORT BY anotherColName) AS counter, t.*
  FROM
   myTableName t) 
SELECT * FROM tablePart WHERE tablePart.counter &amp;lt;= limit;&lt;/PRE&gt;&lt;P&gt;First of all, I create groups of data via the distribute then I sort them by anotherColName, then I use&lt;STRONG&gt; &lt;/STRONG&gt;row_number&lt;STRONG&gt; &lt;/STRONG&gt;to assign a value to each row of each group as if it was a counter.&lt;/P&gt;&lt;P&gt;Then I select that counter and all the columns of the original table where the value of the local counter is less or equal to my limit.&lt;/P&gt;&lt;P&gt;You could add have you random data in this way:&lt;/P&gt;&lt;PRE&gt;with tablePart as (SELECT  ROW_NUMBER() OVER (DISTRIBUTE BY colName SORT BY rand()) AS counter, t.*
  FROM
   myTableName t) 
SELECT * FROM tablePart WHERE tablePart.counter &amp;lt;= limit;&lt;/PRE&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 30 Mar 2018 22:42:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198491#M160542</guid>
      <dc:creator>joyjedid</dc:creator>
      <dc:date>2018-03-30T22:42:04Z</dc:date>
    </item>
    <item>
      <title>Re: [HIVE] is there a way to perform a "local limit" in a Hive query ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198492#M160543</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/73223/joyjedid.html" nodeid="73223"&gt;@Joy Ndjama&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Awesome ! Exactly what I was expecting.&lt;/P&gt;&lt;P&gt;Even if it is quite expensive, it is a elegant way to get a true sample.&lt;/P&gt;&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/186/sshaw.html" nodeid="186"&gt;@Scott Shaw&lt;/A&gt; as well, TABLESAMPLE is a very interesting functionnality too.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Apr 2018 14:26:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/HIVE-is-there-a-way-to-perform-a-quot-local-limit-quot-in-a/m-p/198492#M160543</guid>
      <dc:creator>sebastien_frack</dc:creator>
      <dc:date>2018-04-04T14:26:41Z</dc:date>
    </item>
  </channel>
</rss>

