<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark DataFrame - difference between sort and orderBy functions? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225890#M187751</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13511/dineshchitlangia.html" nodeid="13511"&gt;@Dinesh Chitlangia&lt;/A&gt;&lt;/P&gt;&lt;P&gt;OrderBy is just an alias for the Sort function and should give the same result.&lt;/P&gt;&lt;P&gt;The below is from the Spark documentation:&lt;/P&gt;&lt;PRE&gt;/**
   * Returns a new Dataset sorted by the given expressions.
   * This is an alias of the `sort` function.
   *
   * @group typedrel
   * @since 2.0.0
   */
  @scala.annotation.varargs
  def orderBy(sortCol: String, sortCols: String*): Dataset[T] = sort(sortCol, sortCols : _*)
&lt;/PRE&gt;&lt;P&gt;Both will order across partitions.  To get an understanding of how Spark does a sort take a look at the explanation in the link below:&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order" target="_blank"&gt;http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you would like to sort within a partition then you can use repartitionAndSortWithinPartitions.&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/rdd/OrderedRDDFunctions.html#repartitionAndSortWithinPartitions%28org.apache.spark.Partitioner%29"&gt;https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/rdd/OrderedRDDFunctions.html#repartitionAndSortWithinPartitions(org.apache.spark.Partitioner)&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 10 May 2017 21:25:49 GMT</pubDate>
    <dc:creator>egarelnabi</dc:creator>
    <dc:date>2017-05-10T21:25:49Z</dc:date>
    <item>
      <title>Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225889#M187750</link>
      <description>&lt;P&gt;Just wanted to understand if there is any functional difference on how sort and orderBy functions on DataFrame works.&lt;/P&gt;&lt;P&gt;Can it be compared to total order sorting across all partitioner outputs or sorting on data within each partition and no guarantee of total order sorting ?&lt;/P&gt;&lt;P&gt;Based on the clarifications, I would like to know the usage of both the functions.&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2017 11:36:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225889#M187750</guid>
      <dc:creator>dineshc</dc:creator>
      <dc:date>2017-05-10T11:36:01Z</dc:date>
    </item>
    <item>
      <title>Re: Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225890#M187751</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13511/dineshchitlangia.html" nodeid="13511"&gt;@Dinesh Chitlangia&lt;/A&gt;&lt;/P&gt;&lt;P&gt;OrderBy is just an alias for the Sort function and should give the same result.&lt;/P&gt;&lt;P&gt;The below is from the Spark documentation:&lt;/P&gt;&lt;PRE&gt;/**
   * Returns a new Dataset sorted by the given expressions.
   * This is an alias of the `sort` function.
   *
   * @group typedrel
   * @since 2.0.0
   */
  @scala.annotation.varargs
  def orderBy(sortCol: String, sortCols: String*): Dataset[T] = sort(sortCol, sortCols : _*)
&lt;/PRE&gt;&lt;P&gt;Both will order across partitions.  To get an understanding of how Spark does a sort take a look at the explanation in the link below:&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order" target="_blank"&gt;http://stackoverflow.com/questions/32887595/how-does-spark-achieve-sort-order&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you would like to sort within a partition then you can use repartitionAndSortWithinPartitions.&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/rdd/OrderedRDDFunctions.html#repartitionAndSortWithinPartitions%28org.apache.spark.Partitioner%29"&gt;https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/rdd/OrderedRDDFunctions.html#repartitionAndSortWithinPartitions(org.apache.spark.Partitioner)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 May 2017 21:25:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225890#M187751</guid>
      <dc:creator>egarelnabi</dc:creator>
      <dc:date>2017-05-10T21:25:49Z</dc:date>
    </item>
    <item>
      <title>Re: Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225891#M187752</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13511/dineshchitlangia.html" nodeid="13511"&gt;@Dinesh Chitlangia&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Sort and orderBy are same when spark is considered. It functions/works on the same way in spark. However in Hive or any other DB the function is quite different.  If you want to know differences in hive then refer the below link&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2017 02:23:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225891#M187752</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2017-05-11T02:23:49Z</dc:date>
    </item>
    <item>
      <title>Re: Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225892#M187753</link>
      <description>&lt;P&gt;Sort &amp;amp; orderBy are same in spark. &lt;/P&gt;&lt;P&gt;OrderBy is an alias for sort in DataSets API-&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala"&gt;https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 22 Apr 2019 15:09:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225892#M187753</guid>
      <dc:creator>Mikhil</dc:creator>
      <dc:date>2019-04-22T15:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225893#M187754</link>
      <description>&lt;P&gt;Sort &amp;amp; orderBy are same in spark. &lt;/P&gt;&lt;P&gt;OrderBy is an alias for sort in DataSets-&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala"&gt;https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 22 Apr 2019 15:10:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/225893#M187754</guid>
      <dc:creator>Mikhil</dc:creator>
      <dc:date>2019-04-22T15:10:57Z</dc:date>
    </item>
    <item>
      <title>Re: Spark DataFrame - difference between sort and orderBy functions?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/294260#M217150</link>
      <description>&lt;P&gt;They are actually not the same.&lt;BR /&gt;&lt;BR /&gt;SORT BY sorts data inside partition, while ORDER BY is global sort.&lt;BR /&gt;&lt;BR /&gt;SORT BY calls&amp;nbsp;&lt;SPAN&gt;sortWithinPartitions() function, while ORDER BY calls sort()&lt;BR /&gt;&lt;BR /&gt;Both of these functions call&amp;nbsp;sortInternal(), but with different global flag:&lt;BR /&gt;def sortWithinPartitions ...&lt;BR /&gt;sortInternal(global &lt;SPAN class="pl-k"&gt;=&lt;/SPAN&gt; &lt;SPAN class="pl-c1"&gt;false&lt;/SPAN&gt;, sortExprs)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;def sort ...&lt;BR /&gt;sortInternal(global &lt;SPAN class="pl-k"&gt;=&lt;/SPAN&gt; &lt;SPAN class="pl-c1"&gt;true&lt;/SPAN&gt;, sortExprs)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2020 21:24:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-DataFrame-difference-between-sort-and-orderBy/m-p/294260#M217150</guid>
      <dc:creator>HasanAmmori</dc:creator>
      <dc:date>2020-04-17T21:24:19Z</dc:date>
    </item>
  </channel>
</rss>

