<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Can we sort a column of a Hive table just before query? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180572#M75299</link>
    <description>&lt;P&gt;My Hive table is in ORC format and queries in it run fastest when columns in where clause are sorted. But in my case there are not currently. What is the syntax to sort a column just before query?&lt;/P&gt;</description>
    <pubDate>Mon, 05 Mar 2018 12:28:38 GMT</pubDate>
    <dc:creator>Hadoopy</dc:creator>
    <dc:date>2018-03-05T12:28:38Z</dc:date>
    <item>
      <title>Can we sort a column of a Hive table just before query?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180572#M75299</link>
      <description>&lt;P&gt;My Hive table is in ORC format and queries in it run fastest when columns in where clause are sorted. But in my case there are not currently. What is the syntax to sort a column just before query?&lt;/P&gt;</description>
      <pubDate>Mon, 05 Mar 2018 12:28:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180572#M75299</guid>
      <dc:creator>Hadoopy</dc:creator>
      <dc:date>2018-03-05T12:28:38Z</dc:date>
    </item>
    <item>
      <title>Re: Can we sort a column of a Hive table just before query?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180573#M75300</link>
      <description>&lt;P&gt;If I understand your question properly, you have an unsorted ORC table. And you want to query that table but want to "sort" the data "before" querying! This does not make any sense since you would be firing some "query" to have sorted data to fire another query on top of it. &lt;/P&gt;&lt;P&gt;Sort can be a costly operation depending on how you implement it. However, there are a bunch of other options that you can use while querying the data which can speed up your queries. Follows some details.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt; Use &lt;B&gt;Tez&lt;/B&gt; execution engine. It is way faster than traditional MR jobs launched by Hive.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Enable predicate pushdown (PPD) to filter at the storage layer:&lt;/STRONG&gt;
&lt;P&gt;SET hive.optimize.ppd=true;&lt;/P&gt;&lt;P&gt;SET hive.optimize.ppd.storage=true&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Vectorized query execution processes data in batches of 1024 rows instead of one by one:&lt;/STRONG&gt;
&lt;P&gt;SET hive.vectorized.execution.enabled=true;&lt;/P&gt;&lt;P&gt;SET hive.vectorized.execution.reduce.enabled=true;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Enable the Cost Based Optimizer (COB) for efficient query execution based on cost and fetch table statistics:&lt;/STRONG&gt;
&lt;P&gt;SET hive.cbo.enable=true;&lt;/P&gt;&lt;P&gt;SET hive.compute.query.using.stats=true;&lt;/P&gt;&lt;P&gt;SET hive.stats.fetch.column.stats=true;&lt;/P&gt;&lt;P&gt;SET hive.stats.fetch.partition.stats=true;&lt;/P&gt;&lt;P&gt;Partition and column statistics from fetched from the metastsore. Use this with caution. If you have too many partitions and/or columns, this could degrade performance.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Control reducer output:&lt;/STRONG&gt;
&lt;P&gt;SET hive.tez.auto.reducer.parallelism=true;&lt;/P&gt;&lt;P&gt;SET hive.tez.max.partition.factor=20;&lt;/P&gt;&lt;P&gt;SET hive.exec.reducers.bytes.per.reducer=128000000;&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Also, you may want to look at the best practices to create ORC tables, &lt;A href="https://community.hortonworks.com/articles/75501/orc-creation-best-practices.html"&gt;mentioned here&lt;/A&gt; , so that you can have the maximum of your queries in the minimum of time!&lt;/P&gt;&lt;P&gt;Hope that helps!&lt;/P&gt;</description>
      <pubDate>Mon, 05 Mar 2018 12:52:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180573#M75300</guid>
      <dc:creator>RahulSoni</dc:creator>
      <dc:date>2018-03-05T12:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: Can we sort a column of a Hive table just before query?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180574#M75301</link>
      <description>&lt;P&gt;Will usage of DISTRIBUTE BY or SORT BY be helpful? &lt;/P&gt;</description>
      <pubDate>Mon, 05 Mar 2018 14:09:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-we-sort-a-column-of-a-Hive-table-just-before-query/m-p/180574#M75301</guid>
      <dc:creator>Hadoopy</dc:creator>
      <dc:date>2018-03-05T14:09:15Z</dc:date>
    </item>
  </channel>
</rss>

