<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive on Tez: How to order an array column? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219247#M79413</link>
    <description>&lt;P&gt;I found a similar question here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect" target="_blank"&gt;https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope it helps&lt;/P&gt;</description>
    <pubDate>Sun, 10 Jun 2018 10:13:11 GMT</pubDate>
    <dc:creator>sunile_manjee</dc:creator>
    <dc:date>2018-06-10T10:13:11Z</dc:date>
    <item>
      <title>Hive on Tez: How to order an array column?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219246#M79412</link>
      <description>&lt;P&gt;Hi!  Here is a small excerpt of my table:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;select * from source_table;&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;point		id					station		journey
1510764333	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	
1510764342	"000003825fc3200b8d541e8ee3844522"	"St. George"	1
1510764420	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	
1510764448	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	
1510764509	"000003825fc3200b8d541e8ee3844522"	"Queens Park"	1	
1510764779	"000003825fc3200b8d541e8ee3844522"	"St. Andrew"	1	
1510764781	"000003825fc3200b8d541e8ee3844522"	"St. Andrew"	1		
1510704290	"0000650b988dabe0cc4d5923e9572548"	"Museum"	2	
1510704508	"0000650b988dabe0cc4d5923e9572548"	"St. George"	2	&amp;lt;br&amp;gt;&lt;/PRE&gt;&lt;P&gt;I would like all the stations per journey grouped in an array ordered from first to last.  The result should look like this:&lt;/P&gt;&lt;PRE&gt;point		id					station		journey	array
1510764333	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["St. George","Museum","Queens Park","St. Andrew"]	
1510764342	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["St. George","Museum","Queens Park","St. Andrew"]
1510764420	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["St. George","Museum","Queens Park","St. Andrew"]
1510764448	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["St. George","Museum","Queens Park","St. Andrew"]
1510764509	"000003825fc3200b8d541e8ee3844522"	"Queens Park"	1	["St. George","Museum","Queens Park","St. Andrew"]	
1510764779	"000003825fc3200b8d541e8ee3844522"	"St. Andrew"	1	["St. George","Museum","Queens Park","St. Andrew"]			
1510704290	"0000650b988dabe0cc4d5923e9572548"	"Museum"	2	["Museum","St.George"]
1510704508	"0000650b988dabe0cc4d5923e9572548"	"St. George"	2	["Museum","St.George"]
&lt;/PRE&gt;&lt;P&gt;Unfortunately I cannot get them into the right order.  This code gives me the array but in the incorrect order (see below):&lt;/P&gt;&lt;P&gt;&lt;EM&gt;select point, ID, station, journey,
        collect_set(station) over (partition by journey) station_arr
  from source_table order by journey, point;&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;point		id					station		journey	array
1510764333	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["Museum","St. George","St. Andrew","Queens Park"]	
1510764342	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["Museum","St. George","St. Andrew","Queens Park"]
1510764420	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["Museum","St. George","St. Andrew","Queens Park"]
1510764448	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["Museum","St. George","St. Andrew","Queens Park"]
1510764509	"000003825fc3200b8d541e8ee3844522"	"Queens Park"	1	["Museum","St. George","St. Andrew","Queens Park"]	
1510764779	"000003825fc3200b8d541e8ee3844522"	"St. Andrew"	1	["Museum","St. George","St. Andrew","Queens Park"]		
1510704290	"0000650b988dabe0cc4d5923e9572548"	"Museum"	2	["Museum","St.George"]
1510704508	"0000650b988dabe0cc4d5923e9572548"	"St. George"	2	["Museum","St.George"]
&lt;/PRE&gt;&lt;P&gt;Is there a way to create this array and keep the order consistent to the source table?  I don't remember having these issues on my sandbox VM and wonder if it has something to do with multiple reducers being involved..?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sun, 10 Jun 2018 10:07:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219246#M79412</guid>
      <dc:creator>alexander_witte</dc:creator>
      <dc:date>2018-06-10T10:07:35Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Tez: How to order an array column?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219247#M79413</link>
      <description>&lt;P&gt;I found a similar question here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect" target="_blank"&gt;https://stackoverflow.com/questions/24805226/keeping-the-order-of-records-in-hive-collect&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope it helps&lt;/P&gt;</description>
      <pubDate>Sun, 10 Jun 2018 10:13:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219247#M79413</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2018-06-10T10:13:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Tez: How to order an array column?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219248#M79414</link>
      <description>&lt;P&gt;Hi Sunile, thanks for the post.  I've been experimenting with ideas in the link you sent and still cannot get these silly arrays to be ordered correctly.    My latest thought is around something like:&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;select point, id, station, journey, collect_set(station) over (partition by journey order by point) as array from source_table;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Which will give me an output like this:&lt;/P&gt;&lt;PRE&gt;point		id					station		journey	array
1510764333	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["St. George"]	
1510764342	"000003825fc3200b8d541e8ee3844522"	"St. George"	1	["St. George"]
1510764420	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["St. George","Museum"]
1510764448	"000003825fc3200b8d541e8ee3844522"	"Museum"	1	["St. George","Museum"]
1510764509	"000003825fc3200b8d541e8ee3844522"	"Queens Park"	1	["St. George","Museum","Queens Park"]	
1510764779	"000003825fc3200b8d541e8ee3844522"	"St. Andrew"	1	["St. George","Museum","Queens Park","St. Andrew"]			
1510704290	"0000650b988dabe0cc4d5923e9572548"	"Museum"	2	["Museum"]
1510704508	"0000650b988dabe0cc4d5923e9572548"	"St. George"	2	["Museum","St.George"]
&lt;/PRE&gt;&lt;P&gt;Since the order of the last array per journey IS correct I'm trying to get the &lt;EM&gt;last_value(array) over (partition by journey)&lt;/EM&gt; with the hopes of just getting the last value and adding it to each row.  Although I can't seem to get it to work yet...&lt;BR /&gt;Kind of a hacky solution though- surely there is a simpler way of doing this that I'm missing...&lt;/P&gt;&lt;P&gt;&lt;EM&gt;select *, last_value(station_arr) over (partition by id,journey) &lt;/EM&gt;&lt;EM&gt;from &lt;/EM&gt;&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;EM&gt;(select point, id, station, journey, collect_set(station) over (partition by journey order by point)
from source_table);&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Jun 2018 22:37:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219248#M79414</guid>
      <dc:creator>alexander_witte</dc:creator>
      <dc:date>2018-06-10T22:37:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive on Tez: How to order an array column?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219249#M79415</link>
      <description>&lt;P&gt;Figured this out- my above approach worked whereby I can order the list and then just grab the last line in the list window.  Like this:&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;select *, last_value(station_arr) over (&lt;/EM&gt;&lt;EM&gt;partition by journey order by point RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING&lt;/EM&gt;&lt;EM&gt;) as route &lt;/EM&gt;&lt;EM&gt;from&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;/EM&gt;&lt;EM&gt;(select point, id, station, journey, collect_set(station) over (partition by journey order by point) from source_table);&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;This will take the ordered list and will populate that list to every row in the partition as requested above.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jun 2018 06:56:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-on-Tez-How-to-order-an-array-column/m-p/219249#M79415</guid>
      <dc:creator>alexander_witte</dc:creator>
      <dc:date>2018-06-12T06:56:12Z</dc:date>
    </item>
  </channel>
</rss>

