Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

Solved Go to solution

when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

While inserting from hive external table P1 stored as Parquet ( partitioned on column e.g. col A ) to the another table P2 stored as Parquet and having same number of columns as table P1 but partitioned on different column ( e.g. Col B), hive throws Premature EOF exception.

exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available.

Any idea the cause of issue.

HDP2.3 cluster with 4 datanodes. Process is running with sufficient map memory and container size.

I have tried with running with TEZ as well as Mapreduce. But same error.

Thanks,

Harshal

1 ACCEPTED SOLUTION

Accepted Solutions

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.

7 REPLIES 7

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

Mentor

Why not use ORC, whats the use case that it requires parquet?

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

So you mean this is something to do with Parquet ? Parquet has good integration with Spark .

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this https://issues.apache.org/jira/browse/HDFS-8475

What features are you missing regarding SparkORC?

I have seen you error before, but in a different context (Query on ORC table was failing)

Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

Thanks for your reply Jonas !

I have verified datanode health and its all fine, there are no corrupt blocks across filesystem. I will check by changing the format to ORC.

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

Same exception for orc hive table as well.

looks like this is a generic issue for below case :

1. create external table T1(col A , B , C) with partition on (col A) stored as ORC . Load table with substantial data. in my case around 85 GB data.

2. Create external table T2(col A,B,C) with partition on (Col B) stored as ORC. Load table T2 from T1 with dynamic partition.

Output :- Premature EOF exception

Please try out !

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.

Re: when inserting data from hive parquet table with partition to another parquet table with partition, exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available, is thrown.

New Contributor

Thanks for the reply !!

Issue was resolved by increasing value for dfs.datanode.max.transfer.threads to 16000 in my case.

Also increasing ulimit value on each worker node.

Regards,

Harshal

Don't have an account?
Coming from Hortonworks? Activate your account here