<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: ERROR: Parquet file should not be split into multiple hdfs-blocks in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/34348#M2035</link>
    <description>&lt;P&gt;I'm running into something similar. &amp;nbsp;I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok.&lt;/P&gt;&lt;P&gt;Can you please share with me what you scripted to make "&lt;SPAN&gt;when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 23 Nov 2015 20:59:16 GMT</pubDate>
    <dc:creator>James K</dc:creator>
    <dc:date>2015-11-23T20:59:16Z</dc:date>
    <item>
      <title>ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/13952#M2030</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to use parquet fileformat and it works fine if I write data using Impala and read it in Hive. However, if I insert data to that table via Hive and read it using Impala, Impala will throw errors like:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ERRORS:&lt;/P&gt;&lt;P&gt;Backend 2: Parquet file should not be split into multiple hdfs-blocks&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems that this error is not a fatal one and Impala is able to get the query results, what might be the cause and how to avoid this Error?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:00:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/13952#M2030</guid>
      <dc:creator>ygnhzeus</dc:creator>
      <dc:date>2022-09-16T09:00:50Z</dc:date>
    </item>
    <item>
      <title>Re: ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/15612#M2031</link>
      <description>How large are your Parquet input files?&lt;BR /&gt;&lt;BR /&gt;If you are copying your files around, have you ensured following the block size preservation method mentioned at &lt;A target="_blank" href="http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_parquet.html?"&gt;http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_parquet.html?&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Within Hive, you can perhaps "set dfs.blocksize=1g;" before issuing the queries to create the files.</description>
      <pubDate>Sun, 20 Jul 2014 15:20:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/15612#M2031</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2014-07-20T15:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/21042#M2032</link>
      <description>&lt;P&gt;Had the same issue, created a partitioned table stored as parquet in Hive, and loaded with data.&lt;/P&gt;&lt;P&gt;Then whe running the query in Impala got the same error message.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried these settings in Hive before running the insert, but the files produced are greater than the HDFS block size (128MB)&lt;/P&gt;&lt;P&gt;SET parquet.block.size=128000000;&lt;BR /&gt;SET dfs.blocksize=128000000;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anybody give an advice?&lt;/P&gt;&lt;P&gt;Tomas&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Oct 2014 10:06:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/21042#M2032</guid>
      <dc:creator>TomasTF</dc:creator>
      <dc:date>2014-10-30T10:06:53Z</dc:date>
    </item>
    <item>
      <title>Re: ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/21303#M2033</link>
      <description>&lt;P&gt;I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;T.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2014 14:08:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/21303#M2033</guid>
      <dc:creator>TomasTF</dc:creator>
      <dc:date>2014-11-07T14:08:45Z</dc:date>
    </item>
    <item>
      <title>Re: ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/22883#M2034</link>
      <description>&lt;P&gt;How can this be done when writing data from a Pig script?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Dec 2014 21:22:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/22883#M2034</guid>
      <dc:creator>BrockOwen</dc:creator>
      <dc:date>2014-12-18T21:22:13Z</dc:date>
    </item>
    <item>
      <title>Re: ERROR: Parquet file should not be split into multiple hdfs-blocks</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/34348#M2035</link>
      <description>&lt;P&gt;I'm running into something similar. &amp;nbsp;I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok.&lt;/P&gt;&lt;P&gt;Can you please share with me what you scripted to make "&lt;SPAN&gt;when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Nov 2015 20:59:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ERROR-Parquet-file-should-not-be-split-into-multiple-hdfs/m-p/34348#M2035</guid>
      <dc:creator>James K</dc:creator>
      <dc:date>2015-11-23T20:59:16Z</dc:date>
    </item>
  </channel>
</rss>

