Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ERROR: Parquet file should not be split into multiple hdfs-blocks

avatar
Explorer

Hello,

 

I'm trying to use parquet fileformat and it works fine if I write data using Impala and read it in Hive. However, if I insert data to that table via Hive and read it using Impala, Impala will throw errors like:

 

ERRORS:

Backend 2: Parquet file should not be split into multiple hdfs-blocks

...

 

It seems that this error is not a fatal one and Impala is able to get the query results, what might be the cause and how to avoid this Error?

 

Thanks!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.

 

T.

 

View solution in original post

5 REPLIES 5

avatar
Mentor
How large are your Parquet input files?

If you are copying your files around, have you ensured following the block size preservation method mentioned at http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

Within Hive, you can perhaps "set dfs.blocksize=1g;" before issuing the queries to create the files.

avatar
Expert Contributor

Had the same issue, created a partitioned table stored as parquet in Hive, and loaded with data.

Then whe running the query in Impala got the same error message.

 

I tried these settings in Hive before running the insert, but the files produced are greater than the HDFS block size (128MB)

SET parquet.block.size=128000000;
SET dfs.blocksize=128000000;

 

Can anybody give an advice?

Tomas

 

avatar
Expert Contributor

I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.

 

T.

 

avatar
New Member

How can this be done when writing data from a Pig script?

 

avatar
Explorer

I'm running into something similar.  I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok.

Can you please share with me what you scripted to make "when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?