Support Questions
Find answers, ask questions, and share your expertise

Hive - Error when selecting data from external hive table on parquet file - does not contain requested field: optional boolean

Explorer

Hello , I created an external hive table on a parquet file and when I run a select * , I am seeing below error

I am running hive on EMR cluster - Hive 2.3.2-amzn-2. I verified that all the fields exist in the parquet file

Did anyone encounter this issue?

Any suggestions would be appreciated

does not contain requested field: optional boolean prepaidFlag:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:499', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:307', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:878', 'sun.reflect.GeneratedMethodAccessor15:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1836', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy35:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:559', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:751', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.lang.IllegalStateException: Group type

7 REPLIES 7

Please share the Create statement from hive table.

Explorer

Hi Venkat, Below is the create table statement.

The parquet file has all the columns listed and the data types match the schema

Create external table Hive_Parquet_Test( 
  statement_Id int,
  statement_MessageId string,
  prepaidFlag boolean,
  item_Count int,
  first_Name string,
  last_Name string
  )
STORED AS PARQUET
LOCATION 's3://bucket_name/hive_parq_test'

Did the source file is also in parquet file format?

Explorer

Yes, it is parquet. here is some additional information

Environment:

  • Running hive on AWS EMR (emr-5.13.0) cluster - Hive 2.3.2-amzn-2.
  • Verified that all the fields exist in the parquet file using parquet tools.
  • Parquet file is generated from nested json using fast-parquet python library

Get the schema by using parquet-tool schema filename in hdfs and use that schema to build external table in hive and use parquet serde at time of creation of hive table.

Explorer

thank you again for your inputs.

yes, I extracted the schema from parquet file and and created external table

I am not clear on your comment " use parquet serde at time of creation of hive table." based on the hive documentation, I am using, STORED AS PARQUET (Hive 2.3.2-amzn-2)

Also I am not sure if the conversion using fast-parquet python library is causing it or if this is a bug in hive

I told you to create the hive table with serde

Syntax:

create table <dbname>.<tablename)(a string,b string,c string,d string,e string,f string,g string,h string,i string,j string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";