Support Questions

Find answers, ask questions, and share your expertise

java.io.EOFException: Reached the end of stream with 340 bytes left to read

avatar
New Contributor

Hi,

I use following in my cdp cluster. While I create a simple parquet table from Hue and store it in external table on s3 express bucket, it shows 'EOFException'.

1. Hive 3.1.3000.7.2.18.200-39, Hadoop 3.1.1.7.2.18.200-39

2. Data hub CM version 7.12.0.200, CM runtime version: 7.2.18-1.cdh7.2.18.p200.54625612

I have tried following setting but didn't help. Could anyone help here? Thank you.

1. Data cluster -> CM -> hdfs -> add fs.s3a.experimental.input.fadvise=sequential

2. Data cluster -> CM -> spark3_on_yarn -> add spark.sql.parquet.enableVectorizedReader=false

 

INFO  : Compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:small.id, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.173 seconds
INFO  : Executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small
INFO  : Completed executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.009 seconds
INFO  : OK
ERROR : Failed with exception java.io.IOException:java.io.EOFException: Reached the end of stream with 340 bytes left to read
java.io.IOException: java.io.EOFException: Reached the end of stream with 340 bytes left to read
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:642)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:549)
	at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:217)
	at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:114)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:820)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
	at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.EOFException: Reached the end of stream with 340 bytes left to read
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:584)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:462)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getParquetMetadata(ParquetRecordReaderBase.java:181)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.setupMetadataAndParquetSplit(ParquetRecordReaderBase.java:87)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:59)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:789)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:353)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:580)
	... 21 more

 

3 REPLIES 3

avatar
Community Manager

@zhaiziha Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive / CDP experts @Shmoo @venkatsambath @mszurap  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

@zhaiziha 

Are you running this query on DataHub or Hive VW?
Please try your SELECT statement on Beeline from DataHub as follows and let us know how it works for you

SET hive.server2.logging.operation.level=VERBOSE;
SET hive.input.format = org.apache.hadoop.hive.ql.io.HiveInputFormat;
SET fs.s3a.experimental.input.fadvise=random;
SET
fs.s3a.readahead.range=1024K;
SET parquet.enable.dictionary=false;
SELECT * FROM small;

Should that fail, can you provide the beeline output and simple repro steps?
Seems like some form of HADOOP-16109

avatar
Community Manager

@zhaiziha Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: