Support Questions

zhaiziha · ‎08-16-2024

Hi,

I use following in my cdp cluster. While I create a simple parquet table from Hue and store it in external table on s3 express bucket, it shows 'EOFException'.

1. Hive 3.1.3000.7.2.18.200-39, Hadoop 3.1.1.7.2.18.200-39

2. Data hub CM version 7.12.0.200, CM runtime version: 7.2.18-1.cdh7.2.18.p200.54625612

I have tried following setting but didn't help. Could anyone help here? Thank you.

1. Data cluster -> CM -> hdfs -> add fs.s3a.experimental.input.fadvise=sequential

2. Data cluster -> CM -> spark3_on_yarn -> add spark.sql.parquet.enableVectorizedReader=false

INFO  : Compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:small.id, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.173 seconds
INFO  : Executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small
INFO  : Completed executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.009 seconds
INFO  : OK
ERROR : Failed with exception java.io.IOException:java.io.EOFException: Reached the end of stream with 340 bytes left to read
java.io.IOException: java.io.EOFException: Reached the end of stream with 340 bytes left to read
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:642)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:549)
	at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:217)
	at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:114)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:820)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
	at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.EOFException: Reached the end of stream with 340 bytes left to read
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
	at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:584)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:462)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getParquetMetadata(ParquetRecordReaderBase.java:181)
	at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.setupMetadataAndParquetSplit(ParquetRecordReaderBase.java:87)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:59)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:789)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:353)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:580)
	... 21 more

DianaTorres · ‎08-16-2024

@zhaiziha Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive / CDP experts @Shmoo @venkatsambath @mszurap who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

JoseManuel · ‎08-16-2024

@zhaiziha

Are you running this query on DataHub or Hive VW?
Please try your SELECT statement on Beeline from DataHub as follows and let us know how it works for you

SET hive.server2.logging.operation.level=VERBOSE;
SET hive.input.format = org.apache.hadoop.hive.ql.io.HiveInputFormat;
SET fs.s3a.experimental.input.fadvise=random;
SET fs.s3a.readahead.range=1024K;
SET parquet.enable.dictionary=false;
SELECT * FROM small;

Should that fail, can you provide the beeline output and simple repro steps?
Seems like some form of HADOOP-16109

DianaTorres · ‎08-23-2024

@zhaiziha Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum