Created 02-24-2016 11:49 AM
While inserting from hive external table P1 stored as Parquet ( partitioned on column e.g. col A ) to the another table P2 stored as Parquet and having same number of columns as table P1 but partitioned on different column ( e.g. Col B), hive throws Premature EOF exception.
exception : hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available.
Any idea the cause of issue.
HDP2.3 cluster with 4 datanodes. Process is running with sufficient map memory and container size.
I have tried with running with TEZ as well as Mapreduce. But same error.
Thanks,
Harshal
Created 11-07-2016 08:22 AM
In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.
Created 02-24-2016 12:06 PM
Why not use ORC, whats the use case that it requires parquet?
Created 02-25-2016 05:26 AM
So you mean this is something to do with Parquet ? Parquet has good integration with Spark .
Created 02-25-2016 06:28 AM
This might be a Parquet problem, but could also be something else. I have seen some performance and job issues when using Parquet instead of ORC. Have you seen this https://issues.apache.org/jira/browse/HDFS-8475
What features are you missing regarding SparkORC?
I have seen you error before, but in a different context (Query on ORC table was failing)
Make sure your HDFS (especially the DNs) are running and healthy. It might be related to some bad blocks, so make sure the blocks that are related to your job are ok
Created 02-26-2016 05:42 AM
Thanks for your reply Jonas !
I have verified datanode health and its all fine, there are no corrupt blocks across filesystem. I will check by changing the format to ORC.
Created 02-26-2016 07:06 AM
Same exception for orc hive table as well.
looks like this is a generic issue for below case :
1. create external table T1(col A , B , C) with partition on (col A) stored as ORC . Load table with substantial data. in my case around 85 GB data.
2. Create external table T2(col A,B,C) with partition on (Col B) stored as ORC. Load table T2 from T1 with dynamic partition.
Output :- Premature EOF exception
Please try out !
Created 11-07-2016 08:22 AM
In case of Hive on Tez, decreasing tez.grouping.max-size might help you. I faced almost same problem before and I decreased tez.grouping.max-size from 1GB to 256MB, then that problem almost(not perfectly) solved.
Created 11-08-2016 06:53 AM
Thanks for the reply !!
Issue was resolved by increasing value for dfs.datanode.max.transfer.threads to 16000 in my case.
Also increasing ulimit value on each worker node.
Regards,
Harshal