<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Limit number of parquet files when doing an insert/create in impala in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60803#M69337</link>
    <description>&lt;P&gt;I am doing something like this&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;create table test2 stored as parquet as select * from t1;&lt;/PRE&gt;&lt;P&gt;And I would like to make sure that only 2 parquet files are created lets say is this possible somehow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As know there is no predictable threshold for how many files will be created.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 12:22:53 GMT</pubDate>
    <dc:creator>gimp077</dc:creator>
    <dc:date>2022-09-16T12:22:53Z</dc:date>
    <item>
      <title>Limit number of parquet files when doing an insert/create in impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60803#M69337</link>
      <description>&lt;P&gt;I am doing something like this&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;create table test2 stored as parquet as select * from t1;&lt;/PRE&gt;&lt;P&gt;And I would like to make sure that only 2 parquet files are created lets say is this possible somehow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As know there is no predictable threshold for how many files will be created.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:22:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60803#M69337</guid>
      <dc:creator>gimp077</dc:creator>
      <dc:date>2022-09-16T12:22:53Z</dc:date>
    </item>
    <item>
      <title>Re: Limit number of parquet files when doing an insert/create in impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60810#M69338</link>
      <description>&lt;P&gt;You can do&amp;nbsp; "set&amp;nbsp;NUM_NODES=1" in your session (before your query), which will cause it to be processed in a single node (just in the coordinator).&amp;nbsp;&amp;nbsp;It will produce&amp;nbsp;1 file, up to the default max size of parquet files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can do "set&amp;nbsp;PARQUET_FILE_SIZE=XX" to fine-tune that max file size up or down until you get it split exactly into 2 files (it will take some trial and error because this is an upper bound - files&amp;nbsp;are actually quite a bit smaller than the limit in my experience).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But beware the docs state&amp;nbsp;&lt;SPAN&gt;NUM_NODES is not for production use, especially on big tables, as it can put a lot of pressure on a single host and crash that impalad.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_query_options.html" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/impala_query_options.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-m&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 03:05:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60810#M69338</guid>
      <dc:creator>mauricio</dc:creator>
      <dc:date>2017-10-11T03:05:30Z</dc:date>
    </item>
    <item>
      <title>Re: Limit number of parquet files when doing an insert/create in impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60812#M69339</link>
      <description>&lt;P&gt;Another option I forgot to mention: if your table is partitioned, and your insert query uses dynamic partitioning, it will generate 1 file per partition:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;insert into table2 partition(par1,par2) select col1, col2 .. colN, par1, par2 from table1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;... again up to the max parquet file size currently set, so you can play with that max to achieve 2 files per partition.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_partitioning.html#partition_static_dynamic" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_partitioning.html#partition_static_dynamic&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 03:10:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Limit-number-of-parquet-files-when-doing-an-insert-create-in/m-p/60812#M69339</guid>
      <dc:creator>mauricio</dc:creator>
      <dc:date>2017-10-11T03:10:31Z</dc:date>
    </item>
  </channel>
</rss>

