<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive query issue in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106849#M69725</link>
    <description>&lt;P&gt;Can you also post the time taken to execute the 20 mi records? thnx ! &lt;/P&gt;</description>
    <pubDate>Thu, 02 Feb 2017 09:52:38 GMT</pubDate>
    <dc:creator>Senthil_Muthuve</dc:creator>
    <dc:date>2017-02-02T09:52:38Z</dc:date>
    <item>
      <title>Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106839#M69715</link>
      <description>&lt;P&gt;Goodmorning everyone. &lt;/P&gt;&lt;P&gt;A customer has a problem with a query on hive (with Tez). &lt;/P&gt;&lt;P&gt;The query is as follows:&lt;/P&gt;&lt;PRE&gt;select a.hashedaddress from x a join x b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;&lt;/PRE&gt;&lt;P&gt;The process is stopped at the first step of mapping:&lt;/P&gt;&lt;PRE&gt;VERTICES	STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1           FAILED     62          0        0       62      43       7 

Map 3           KILLED     62          0        0       62       0       0 

Reducer 2       KILLED   1009          0        0     1009       0       0 

VERTICES: 00/03  [&amp;gt;&amp;gt;--------------------------] 0%    ELAPSED TIME: 26.69 s&lt;/PRE&gt;&lt;P&gt;Log error is this: &lt;A href="https://community.cloudera.com/legacyfs/online/attachments/1483-vertexerror.txt"&gt;vertexerror.txt&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The table schema is as follows:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE `x`(
  `guid` string,
  `brandname` string, `hashedaddress` string, [and more]) PARTITIONED BY (
  `calendardate` date) CLUSTERED BY (
  brandname) 

INTO 5 BUCKETS ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 

STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 

OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 

LOCATION
  '*****' 

TBLPROPERTIES (
  'transient_lastDdlTime'='1445504632')&lt;/PRE&gt;&lt;P&gt;Table contains some 20 million records, to do the tests we tried to apply a limit to the tables:&lt;/P&gt;&lt;PRE&gt;select a.hashedaddress from (select * from x limit 10000) a join (select * from x limit 10000) b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;&lt;/PRE&gt;&lt;P&gt;The query runs without problems&lt;/P&gt;&lt;PRE&gt;VERTICES      	STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..........SUCCEEDED     62         62        0        0       0       0 

Map 3 ..........SUCCEEDED     62         62        0        0       0       0 

Reducer 2 ......SUCCEEDED      1          1        0        0       0       0 

Reducer 4 ......SUCCEEDED      1          1        0        0       0       0 

VERTICES: 04/04  [==========================&amp;gt;&amp;gt;] 100%  ELAPSED TIME: 55.13 s&lt;/PRE&gt;Then we have increased the limits up to the cap (100000. 1000000, 10000000) and it works correctly,
After this changing the query removing the limitations and it works:&lt;PRE&gt;select a.hashedaddress from (select * from x) a join (select * from x) b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;&lt;/PRE&gt;&lt;PRE&gt;VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
Map 1 ..........SUCCEEDED     62         62        0        0       0       0 

Map 3 ..........SUCCEEDED     62         62        0        0       0       0 

Reducer 2 ......SUCCEEDED     253        253       0        0       0       0 

VERTICES: 03/03  [==========================&amp;gt;&amp;gt;] 100%  ELAPSED TIME: 
&lt;/PRE&gt;&lt;P&gt;Can you help me understand why not work the first query? The join does it with the same data. Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2016 19:56:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106839#M69715</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-01-21T19:56:02Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106840#M69716</link>
      <description>&lt;P&gt;What is the HDP or Hive version?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2016 00:31:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106840#M69716</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2016-01-22T00:31:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106841#M69717</link>
      <description>&lt;P&gt;hive 0.14.0.2.2 on HDP-2.2.7.1-10
ps: is hdinsight distribution.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2016 00:46:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106841#M69717</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-01-22T00:46:29Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106842#M69718</link>
      <description>&lt;P&gt;Please provide the explain plan of the failing query. Also can you try the failing query with hive.vectorized.execution.enabled=false?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2016 01:04:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106842#M69718</guid>
      <dc:creator>deepesh1</dc:creator>
      <dc:date>2016-01-22T01:04:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106843#M69719</link>
      <description>&lt;P&gt;Hi Deepesh.&lt;/P&gt;&lt;P&gt;Tonight the customer tries to disable the vectorization, when give me a feedback I update you.&lt;/P&gt;&lt;P&gt;For the moment, thank you!&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2016 22:24:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106843#M69719</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-01-22T22:24:22Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106844#M69720</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1391/ubaldi.html" nodeid="1391"&gt;@Alessio Ubaldi&lt;/A&gt;&lt;P&gt;Just to test ...can you run the query withouth vectorization?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;set hive.vectorized.execution.enabled=false&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 24 Jan 2016 02:33:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106844#M69720</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-01-24T02:33:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106845#M69721</link>
      <description>&lt;P&gt;Hive doesn't support non-equijoins yet. This is coming soon.&lt;/P&gt;</description>
      <pubDate>Sun, 24 Jan 2016 17:09:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106845#M69721</guid>
      <dc:creator>sball</dc:creator>
      <dc:date>2016-01-24T17:09:03Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106846#M69722</link>
      <description>&lt;P&gt;Thanks for answer!&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jan 2016 17:40:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106846#M69722</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-01-25T17:40:31Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106847#M69723</link>
      <description>&lt;P&gt;@Deepesh suggested this solution. &lt;/P&gt;&lt;P&gt;Friday I informed the customer, I still waiting for a response.
Just answer me I'll know you. &lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jan 2016 17:45:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106847#M69723</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-01-25T17:45:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106848#M69724</link>
      <description>&lt;P&gt;Finally the customer has applied the change (hive.vectorized.execution.enabled=false).
Everything works correctly now.&lt;/P&gt;&lt;P&gt;Thanks so much&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 16:24:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106848#M69724</guid>
      <dc:creator>aubaldi</dc:creator>
      <dc:date>2016-02-03T16:24:28Z</dc:date>
    </item>
    <item>
      <title>Re: Hive query issue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106849#M69725</link>
      <description>&lt;P&gt;Can you also post the time taken to execute the 20 mi records? thnx ! &lt;/P&gt;</description>
      <pubDate>Thu, 02 Feb 2017 09:52:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-query-issue/m-p/106849#M69725</guid>
      <dc:creator>Senthil_Muthuve</dc:creator>
      <dc:date>2017-02-02T09:52:38Z</dc:date>
    </item>
  </channel>
</rss>

