<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Order by over  subquery return wrong results in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205193#M167179</link>
    <description>&lt;P&gt;Hey &lt;A href="https://community.hortonworks.com/users/37152/nramanaiah.html"&gt;@Naresh P R&lt;/A&gt; , thanks for the response.&lt;/P&gt;&lt;P&gt;HIVE-6348 should solve the issue by removing the order by from the subquery, however the ticket seems to be treating it as an optimization.&lt;/P&gt;&lt;P&gt;Would you know why the data corruption is happening?&lt;/P&gt;</description>
    <pubDate>Thu, 04 Oct 2018 23:49:44 GMT</pubDate>
    <dc:creator>ravi3</dc:creator>
    <dc:date>2018-10-04T23:49:44Z</dc:date>
    <item>
      <title>Order by over  subquery return wrong results</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205191#M167177</link>
      <description>&lt;P&gt;EDIT : A more minimal failure case at the bottom&lt;/P&gt;&lt;P&gt;
 I have a table with definition&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
 &lt;TD&gt;
  CREATE TABLE `testtable`(
  `color_gbakc2` string, 
  `noq_empty_gbakc3` bigint, 
  `color_gbakc1` string)
  &lt;BR /&gt;
  ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  &lt;BR /&gt;
  STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
  &lt;BR /&gt;
  OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
  &lt;BR /&gt;
  LOCATION
  'hdfs://&amp;lt;ip&amp;gt;:8020/apps/hive/warehouse/testtable'
  &lt;BR /&gt;
  TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
  'numFiles'='1',
 &lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;&lt;P&gt;
 The data in it looks like&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
 &lt;TD&gt;
  select * from testTable
  &lt;BR /&gt;
  &lt;BR /&gt;
  testtable.color_gbakc2,testtable.noq_empty_gbakc3,testtable.color_gbakc1
  &lt;BR /&gt;
  "",45456,
  &lt;BR /&gt;
  Black,15681,Black
  &lt;BR /&gt;
  Blue,6203,Blue
  &lt;BR /&gt;
  Multi,6196,Multi
  &lt;BR /&gt;
  Red,7716,Red
  &lt;BR /&gt;
  Silver,5408,Silver
  &lt;BR /&gt;
  White,908,White
  &lt;BR /&gt;
  Yellow,7599,Yellow
 &lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;&lt;P&gt;
 I have a working query&lt;/P&gt;
&lt;PRE&gt;SELECT t_6.noq_empty noq_empty,
       t_6.color color
FROM
  (SELECT t_5.color_gbakc2 color,
          t_5.noq_empty_gbakc3 noq_empty
   FROM
     (SELECT testtable.color_gbakc2 color_gbakc2,
             testtable.noq_empty_gbakc3 noq_empty_gbakc3,
             testtable.color_gbakc1 color_gbakc1
      FROM testtable testtable
      WHERE testtable.color_gbakc2 IN ('Red',
                                       'Blue',
                                       'Green') ) t_5
   WHERE t_5.color_gbakc2 IN ('Red',
                              'Blue')
   ORDER BY noq_empty ASC ) t_6
&lt;/PRE&gt;&lt;P&gt;
 It returns the expected results&lt;/P&gt;
&lt;TABLE&gt;

&lt;TBODY&gt;&lt;TR&gt;
 &lt;TH&gt;
  noq_empty
 &lt;/TH&gt;
 &lt;TH&gt;
  color
 &lt;/TH&gt;
&lt;/TR&gt;

&lt;/TBODY&gt;&lt;TBODY&gt;
&lt;TR&gt;
 &lt;TD&gt;
  6203
 &lt;/TD&gt;
 &lt;TD&gt;
  Blue
 &lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
 &lt;TD&gt;
  7716
 &lt;/TD&gt;
 &lt;TD&gt;
  Red
 &lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;&lt;P&gt;
 But when I add an order by clause I get unexpected result&lt;/P&gt;
&lt;PRE&gt;SELECT t_6.noq_empty noq_empty,
       t_6.color color
FROM
  (SELECT t_5.color_gbakc2 color,
          t_5.noq_empty_gbakc3 noq_empty
   FROM
     (SELECT testtable.color_gbakc2 color_gbakc2,
             testtable.noq_empty_gbakc3 noq_empty_gbakc3,
             testtable.color_gbakc1 color_gbakc1
      FROM testtable testtable
      WHERE testtable.color_gbakc2 IN ('Red',
                                       'Blue',
                                       'Green') ) t_5
   WHERE t_5.color_gbakc2 IN ('Red',
                              'Blue')
   ORDER BY noq_empty ASC ) t_6
ORDER BY color
&lt;/PRE&gt;&lt;P&gt;I get un expected results. Please see the attachment.&lt;/P&gt;&lt;P&gt;Running HDP 2.6.5.&lt;/P&gt;&lt;P&gt;Is this a known issue?&lt;/P&gt;&lt;P&gt;******EDIT 1********&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;NOT WORKING&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;SELECT 
       t6.amt amt, t6.color color
FROM
  (SELECT t5.color color,
          t5.c1 amt
   FROM
     (SELECT 
             t1.c1 c1,
             t1.c2 AS color from
        (SELECT  7716 AS c1, "Red" AS c2
         UNION SELECT  6203 AS c1, "Blue" AS c2) t1
      WHERE t1.c2 IN ('Red',
                      'Blue',
                      'Green')) t5
   WHERE t5.color IN ('Red',
                      'Blue')
   ORDER BY amt ASC) t6
ORDER BY color
&lt;/PRE&gt;&lt;P&gt;Changing the order of columns in outer most select&lt;BR /&gt;&lt;STRONG&gt;WORKS&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;SELECT 
        t6.color color,t6.amt amt
FROM
  (SELECT t5.color color,
          t5.c1 amt
   FROM
     (SELECT 
             t1.c1 c1,
             t1.c2 AS color from
        (SELECT  7716 AS c1, "Red" AS c2
         UNION SELECT  6203 AS c1, "Blue" AS c2) t1
      WHERE t1.c2 IN ('Red',
                      'Blue',
                      'Green')) t5
   WHERE t5.color IN ('Red',
                      'Blue')
   ORDER BY amt ASC) t6
ORDER BY color&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="91595-screen-shot-2018-10-02-at-111408.png" style="width: 1196px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/17126iB0D02E8A5F937FE7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="91595-screen-shot-2018-10-02-at-111408.png" alt="91595-screen-shot-2018-10-02-at-111408.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 04:53:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205191#M167177</guid>
      <dc:creator>ravi3</dc:creator>
      <dc:date>2019-08-18T04:53:46Z</dc:date>
    </item>
    <item>
      <title>Re: Order by over  subquery return wrong results</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205192#M167178</link>
      <description>&lt;P&gt;HIVE-6348 should resolve this issue. I don't see any workaround for this issue. This fix is available in HDP-3&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 16:06:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205192#M167178</guid>
      <dc:creator>nramanaiah</dc:creator>
      <dc:date>2018-10-04T16:06:24Z</dc:date>
    </item>
    <item>
      <title>Re: Order by over  subquery return wrong results</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205193#M167179</link>
      <description>&lt;P&gt;Hey &lt;A href="https://community.hortonworks.com/users/37152/nramanaiah.html"&gt;@Naresh P R&lt;/A&gt; , thanks for the response.&lt;/P&gt;&lt;P&gt;HIVE-6348 should solve the issue by removing the order by from the subquery, however the ticket seems to be treating it as an optimization.&lt;/P&gt;&lt;P&gt;Would you know why the data corruption is happening?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 23:49:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205193#M167179</guid>
      <dc:creator>ravi3</dc:creator>
      <dc:date>2018-10-04T23:49:44Z</dc:date>
    </item>
    <item>
      <title>Re: Order by over  subquery return wrong results</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205194#M167180</link>
      <description>&lt;P&gt;When applying order by twice, hive assumes column selection order is same on both inner &amp;amp; outer query which does order by on wrong column assuming wrong column datatype. You can rewrite your query having outer &amp;amp; inner query projection as same.&lt;/P&gt;&lt;PRE&gt;SELECT t_6.color color, t_6.noq_empty noq_empty
FROM
  (SELECT t_5.color_gbakc2 color,
          t_5.noq_empty_gbakc3 noq_empty
   FROM
     (SELECT testtable.color_gbakc2 color_gbakc2,
             testtable.noq_empty_gbakc3 noq_empty_gbakc3,
             testtable.color_gbakc1 color_gbakc1
      FROM testtable testtable
      WHERE testtable.color_gbakc2 IN ('Red',
                                       'Blue',
                                       'Green') ) t_5
   WHERE t_5.color_gbakc2 IN ('Red',
                              'Blue')
   ORDER BY noq_empty ASC ) t_6
ORDER BY color&lt;/PRE&gt;&lt;P&gt;Fortunately HIVE-6348 is a plan optimizer &amp;amp; its eliminating this issue also.&lt;/P&gt;&lt;P&gt;If my answer helped you, accept the answer. It will help others in the community.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Oct 2018 00:24:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Order-by-over-subquery-return-wrong-results/m-p/205194#M167180</guid>
      <dc:creator>nramanaiah</dc:creator>
      <dc:date>2018-10-05T00:24:53Z</dc:date>
    </item>
  </channel>
</rss>

