- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ERROR: Parquet file should not be split into multiple hdfs-blocks
- Labels:
-
Apache Hive
-
Apache Impala
-
HDFS
Created on 06-21-2014 01:04 AM - edited 09-16-2022 02:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to use parquet fileformat and it works fine if I write data using Impala and read it in Hive. However, if I insert data to that table via Hive and read it using Impala, Impala will throw errors like:
ERRORS:
Backend 2: Parquet file should not be split into multiple hdfs-blocks
...
It seems that this error is not a fatal one and Impala is able to get the query results, what might be the cause and how to avoid this Error?
Thanks!
Created 11-07-2014 06:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.
T.
Created 07-20-2014 08:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are copying your files around, have you ensured following the block size preservation method mentioned at http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...
Within Hive, you can perhaps "set dfs.blocksize=1g;" before issuing the queries to create the files.
Created 10-30-2014 03:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Had the same issue, created a partitioned table stored as parquet in Hive, and loaded with data.
Then whe running the query in Impala got the same error message.
I tried these settings in Hive before running the insert, but the files produced are greater than the HDFS block size (128MB)
SET parquet.block.size=128000000;
SET dfs.blocksize=128000000;
Can anybody give an advice?
Tomas
Created 11-07-2014 06:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.
T.
Created 12-18-2014 01:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How can this be done when writing data from a Pig script?
Created 11-23-2015 12:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running into something similar. I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok.
Can you please share with me what you scripted to make "when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?