Created 08-16-2024 08:54 AM
Hi,
I use following in my cdp cluster. While I create a simple parquet table from Hue and store it in external table on s3 express bucket, it shows 'EOFException'.
1. Hive 3.1.3000.7.2.18.200-39, Hadoop 3.1.1.7.2.18.200-39
2. Data hub CM version 7.12.0.200, CM runtime version: 7.2.18-1.cdh7.2.18.p200.54625612
I have tried following setting but didn't help. Could anyone help here? Thank you.
1. Data cluster -> CM -> hdfs -> add fs.s3a.experimental.input.fadvise=sequential
2. Data cluster -> CM -> spark3_on_yarn -> add spark.sql.parquet.enableVectorizedReader=false
INFO : Compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:small.id, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.173 seconds INFO : Executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c): select * from small INFO : Completed executing command(queryId=hive_20240816153040_ebe87870-3c90-4f07-84dc-b2b6e354520c); Time taken: 0.009 seconds INFO : OK ERROR : Failed with exception java.io.IOException:java.io.EOFException: Reached the end of stream with 340 bytes left to read java.io.IOException: java.io.EOFException: Reached the end of stream with 340 bytes left to read at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:642) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:549) at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:217) at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:114) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:820) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.EOFException: Reached the end of stream with 340 bytes left to read at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104) at org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127) at org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:584) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:536) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:530) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:478) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:462) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getParquetMetadata(ParquetRecordReaderBase.java:181) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.setupMetadataAndParquetSplit(ParquetRecordReaderBase.java:87) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:59) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93) at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:789) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:353) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:580) ... 21 more
Created 08-16-2024 01:15 PM
@zhaiziha Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive / CDP experts @Shmoo @venkatsambath @mszurap who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 08-16-2024 02:12 PM
Are you running this query on DataHub or Hive VW?
Please try your SELECT statement on Beeline from DataHub as follows and let us know how it works for you
SET hive.server2.logging.operation.level=VERBOSE;
SET hive.input.format = org.apache.hadoop.hive.ql.io.HiveInputFormat;
SET fs.s3a.experimental.input.fadvise=random;
SET fs.s3a.readahead.range=1024K;
SET parquet.enable.dictionary=false;
SELECT * FROM small;
Should that fail, can you provide the beeline output and simple repro steps?
Seems like some form of HADOOP-16109
Created 08-23-2024 10:21 AM
@zhaiziha Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,