Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Reading data from Hive External Table on Parquet files fails with error /.metadata/descriptor.properties is not a Parquet file

avatar
Frequent Visitor

Hello Experts, I imported some sample data from RDBMS into hadoop using sqoop

Format : parquet with snappy compression, I am running hive on EMR cluster - Hive 2.3.2-amzn-2 , Sqoop 1.4.6

When I try to create a hive external table and read the data, I see below error

Has anyone encountered this issue and resolved it?

Appreciate your help on this

  • Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\x9f\x1f\x8e\xfde\xe8E\x8f\x941\xc6\x93%\xec[A', guid='\xc2\xc83/\xea\x9aK\xfb\x833\x1f\xfa\x10\xdd\x88\xaa')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.lang.RuntimeException: hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.lang.RuntimeException: hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:499', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:307', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:878', 'sun.reflect.GeneratedMethodAccessor15:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1836', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy35:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:559', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:751', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.lang.RuntimeException: hdfs://i/<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:29:4', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:521', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:428', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:147', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2208', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:494', '*java.lang.RuntimeException:hdfs:///<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:38:9', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:423', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:386', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:372', 'org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase:getSplit:ParquetRecordReaderBase.java:79', 'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper:<init>:ParquetRecordReaderWrapper.java:75', 'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper:<init>:ParquetRecordReaderWrapper.java:60', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat:getRecordReader:MapredParquetInputFormat.java:75', 'org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit:getRecordReader:FetchOperator.java:695', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:333', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], statusCode=3), results=None, hasMoreRows=None)
1 ACCEPTED SOLUTION

avatar
Expert Contributor

@cskbhatt, i assume external table location is "hdfs://<emr node>:8020/poc/test_table/"

This issue is happening because hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file, but exist inside table folder.

When Hive ParquetRecordReader tries to read this file, its throwing above exception. Remove all non parquet files from table location & retry your query.

View solution in original post

1 REPLY 1

avatar
Expert Contributor

@cskbhatt, i assume external table location is "hdfs://<emr node>:8020/poc/test_table/"

This issue is happening because hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file, but exist inside table folder.

When Hive ParquetRecordReader tries to read this file, its throwing above exception. Remove all non parquet files from table location & retry your query.