<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to distribute impala table partitions in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34762#M11263</link>
    <description>&lt;P&gt;It may help if you describe what your use case is here/your goal with this operation. There may be several ways to reach that goal.&lt;/P&gt;</description>
    <pubDate>Fri, 04 Dec 2015 16:35:56 GMT</pubDate>
    <dc:creator>jkestelyn</dc:creator>
    <dc:date>2015-12-04T16:35:56Z</dc:date>
    <item>
      <title>How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34702#M11260</link>
      <description>&lt;P&gt;Is there a way to distribute impala table partitions onto multiple hdfs data nodes without replication?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Bhaskar&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:51:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34702#M11260</guid>
      <dc:creator>Bsinghal</dc:creator>
      <dc:date>2022-09-16T09:51:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34739#M11261</link>
      <description>&lt;P&gt;Bhaskar,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The answer is "yes" (hat tip to John Russell) because HDFS is capable of locating data blocks on any data node, even with a replication factor of&amp;nbsp;1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, you need to be careful because if you're too fine-grained about distributing your partitions/Parquet files across the cluster, performance can suffer. Performance will be better and more predicatable with fewer blocks for your query to find.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2015 22:26:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34739#M11261</guid>
      <dc:creator>jkestelyn</dc:creator>
      <dc:date>2015-12-03T22:26:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34745#M11262</link>
      <description>&lt;P&gt;Thanks a lot for the reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there some argument/parameter I can specify with create table in impala to ensure HDFS distributes data blocks across multiple data nodes? If not, how do I do this?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Bhaskar&lt;/P&gt;&lt;P&gt;p.s. just getting started with hdfs/impala/hadoop/kudu..&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2015 03:49:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34745#M11262</guid>
      <dc:creator>Bsinghal</dc:creator>
      <dc:date>2015-12-04T03:49:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34762#M11263</link>
      <description>&lt;P&gt;It may help if you describe what your use case is here/your goal with this operation. There may be several ways to reach that goal.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Dec 2015 16:35:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/34762#M11263</guid>
      <dc:creator>jkestelyn</dc:creator>
      <dc:date>2015-12-04T16:35:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/35042#M11264</link>
      <description>&lt;P&gt;Sorry for late response.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not looking at any particular use case. Just trying to see how the impala query is executed if the data is distributed across multiple hdfs data nodes. Its an experimental setup, so performance currently is irrelevant, its more for getting in-depth understanding.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the query execution plan I want to observe SCAN_HDFS and AGGREGATION.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Bhaskar&lt;/P&gt;</description>
      <pubDate>Thu, 10 Dec 2015 13:31:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/35042#M11264</guid>
      <dc:creator>Bsinghal</dc:creator>
      <dc:date>2015-12-10T13:31:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to distribute impala table partitions</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/35625#M11265</link>
      <description>&lt;P&gt;Impala does not have control of the physical locations of the HDFS blocks underlying Impala tables.&lt;BR /&gt;&lt;BR /&gt;The tables in Impala are backed by files on HDFS and those files are chopped into blocks and distributed according to your HDFS configuration, but for all practical purposes the blocks are distributed round-robin among the data nodes (grossly simplified). Impala queries typically run on all data nodes that store data relevant to answering a parcitular query, so given a fixed amount of data, you can indirectly control Impala's degree of (inter-node) parallelism by changing the HDFS block size. More blocks == more parallelism.&lt;BR /&gt;&lt;BR /&gt;If you are interested in learning about Impala, you may also find the CIDR paper useful:&lt;BR /&gt;&lt;A href="http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf" target="_blank"&gt;http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Dec 2015 23:34:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-distribute-impala-table-partitions/m-p/35625#M11265</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-12-29T23:34:49Z</dc:date>
    </item>
  </channel>
</rss>

