Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Reading data from Hive External Table on Parquet files fails with error /.metadata/descriptor.properties is not a Parquet file

avatar
Contributor

Hello Experts, I imported some sample data from RDBMS into hadoop using sqoop

Format : parquet with snappy compression, I am running hive on EMR cluster - Hive 2.3.2-amzn-2 , Sqoop 1.4.6

When I try to create a hive external table and read the data, I see below error

Has anyone encountered this issue and resolved it?

Appreciate your help on this

  • Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\x9f\x1f\x8e\xfde\xe8E\x8f\x941\xc6\x93%\xec[A', guid='\xc2\xc83/\xea\x9aK\xfb\x833\x1f\xfa\x10\xdd\x88\xaa')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.lang.RuntimeException: hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.lang.RuntimeException: hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:499', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:307', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:878', 'sun.reflect.GeneratedMethodAccessor15:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1836', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy35:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:559', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:751', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.lang.RuntimeException: hdfs://i/<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:29:4', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:521', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:428', 'org.apache.hadoop.hive.ql.exec.FetchTask:fetch:FetchTask.java:147', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2208', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:494', '*java.lang.RuntimeException:hdfs:///<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]:38:9', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:423', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:386', 'org.apache.parquet.hadoop.ParquetFileReader:readFooter:ParquetFileReader.java:372', 'org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase:getSplit:ParquetRecordReaderBase.java:79', 'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper:<init>:ParquetRecordReaderWrapper.java:75', 'org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper:<init>:ParquetRecordReaderWrapper.java:60', 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat:getRecordReader:MapredParquetInputFormat.java:75', 'org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit:getRecordReader:FetchOperator.java:695', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:333', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:459'], statusCode=3), results=None, hasMoreRows=None)
1 ACCEPTED SOLUTION

avatar
Expert Contributor

@cskbhatt, i assume external table location is "hdfs://<emr node>:8020/poc/test_table/"

This issue is happening because hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file, but exist inside table folder.

When Hive ParquetRecordReader tries to read this file, its throwing above exception. Remove all non parquet files from table location & retry your query.

View solution in original post

1 REPLY 1

avatar
Expert Contributor

@cskbhatt, i assume external table location is "hdfs://<emr node>:8020/poc/test_table/"

This issue is happening because hdfs://<emr node>:8020/poc/test_table/.metadata/descriptor.properties is not a Parquet file, but exist inside table folder.

When Hive ParquetRecordReader tries to read this file, its throwing above exception. Remove all non parquet files from table location & retry your query.