Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ERROR: Parquet file should not be split into multiple hdfs-blocks

Solved Go to solution
Highlighted

ERROR: Parquet file should not be split into multiple hdfs-blocks

New Contributor

Hello,

 

I'm trying to use parquet fileformat and it works fine if I write data using Impala and read it in Hive. However, if I insert data to that table via Hive and read it using Impala, Impala will throw errors like:

 

ERRORS:

Backend 2: Parquet file should not be split into multiple hdfs-blocks

...

 

It seems that this error is not a fatal one and Impala is able to get the query results, what might be the cause and how to avoid this Error?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

Rising Star

I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.

 

T.

 

5 REPLIES 5

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

Master Guru
How large are your Parquet input files?

If you are copying your files around, have you ensured following the block size preservation method mentioned at http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Im...

Within Hive, you can perhaps "set dfs.blocksize=1g;" before issuing the queries to create the files.

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

Rising Star

Had the same issue, created a partitioned table stored as parquet in Hive, and loaded with data.

Then whe running the query in Impala got the same error message.

 

I tried these settings in Hive before running the insert, but the files produced are greater than the HDFS block size (128MB)

SET parquet.block.size=128000000;
SET dfs.blocksize=128000000;

 

Can anybody give an advice?

Tomas

 

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

Rising Star

I solved it with increasing the block size to the largest possible value of the partition, so when one partition is always less than 800MB I set the block size for this table to 1GB, and the warnings do not appear any more.

 

T.

 

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

New Contributor

How can this be done when writing data from a Pig script?

 

Re: ERROR: Parquet file should not be split into multiple hdfs-blocks

Explorer

I'm running into something similar.  I'm on 5.4.2 building tables with have then analyzing with Impala and I get the same warnings, although the queries execute ok.

Can you please share with me what you scripted to make "when one partition is always less than 800MB I set the block size for this table to 1GB" as you mention in your post?

Don't have an account?
Coming from Hortonworks? Activate your account here