Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

avatar
Contributor

While inserting from hive external table P1 stored as Parquet ( partitioned on column e.g. col A ) to the another table P2 stored as Parquet and having same number of columns as table P1 but partitioned on different column ( e.g. Col B), hive throws Premature EOF exception.

exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available.

Any idea the cause of issue.

HDP2.3 cluster with 4 datanodes. Process is running with sufficient map memory and container size.

I have tried with running with TEZ as well as Mapreduce. But same error.

Thanks,

Harshal

1 ACCEPTED SOLUTION

avatar
Contributor

In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

Why not use ORC, whats the use case that it requires parquet?

avatar
Contributor

So you mean this is something to do with Parquet ? Parquet has good integration with Spark .

avatar

This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this https://issues.apache.org/jira/browse/HDFS-8475

What features are you missing regarding SparkORC?

I have seen you error before, but in a different context (Query on ORC table was failing)

Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok

avatar
Contributor

Thanks for your reply Jonas !

I have verified datanode health and its all fine, there are no corrupt blocks across filesystem. I will check by changing the format to ORC.

avatar
Contributor

Same exception for orc hive table as well.

looks like this is a generic issue for below case :

1. create external table T1(col A , B , C) with partition on (col A) stored as ORC . Load table with substantial data. in my case around 85 GB data.

2. Create external table T2(col A,B,C) with partition on (Col B) stored as ORC. Load table T2 from T1 with dynamic partition.

Output :- Premature EOF exception

Please try out !

avatar
Contributor

In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.

avatar
Contributor

Thanks for the reply !!

Issue was resolved by increasing value for dfs.datanode.max.transfer.threads to 16000 in my case.

Also increasing ulimit value on each worker node.

Regards,

Harshal