Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive - Error when selecting data from external hive table on parquet file - does not contain requested field: optional boolean

avatar
Contributor

Hello , I created an external hive table on a parquet file and when I run a select * , I am seeing below error

I am running hive on EMR cluster - Hive 2.3.2-amzn-2. I verified that all the fields exist in the parquet file

Did anyone encounter this issue?

Any suggestions would be appreciated

does not contain requested field: optional boolean prepaidFlag:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:499', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:307', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:878', 'sun.reflect.GeneratedMethodAccessor15:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1836', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy35:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:559', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:751', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.lang.IllegalStateException: Group type

7 REPLIES 7

avatar

Please share the Create statement from hive table.

avatar
Contributor

Hi Venkat, Below is the create table statement.

The parquet file has all the columns listed and the data types match the schema

Create external table Hive_Parquet_Test( 
  statement_Id int,
  statement_MessageId string,
  prepaidFlag boolean,
  item_Count int,
  first_Name string,
  last_Name string
  )
STORED AS PARQUET
LOCATION 's3://bucket_name/hive_parq_test'

avatar

Did the source file is also in parquet file format?

avatar
Contributor

Yes, it is parquet. here is some additional information

Environment:

  • Running hive on AWS EMR (emr-5.13.0) cluster - Hive 2.3.2-amzn-2.
  • Verified that all the fields exist in the parquet file using parquet tools.
  • Parquet file is generated from nested json using fast-parquet python library

avatar

Get the schema by using parquet-tool schema filename in hdfs and use that schema to build external table in hive and use parquet serde at time of creation of hive table.

avatar
Contributor

thank you again for your inputs.

yes, I extracted the schema from parquet file and and created external table

I am not clear on your comment " use parquet serde at time of creation of hive table." based on the hive documentation, I am using, STORED AS PARQUET (Hive 2.3.2-amzn-2)

Also I am not sure if the conversion using fast-parquet python library is causing it or if this is a bug in hive

avatar

I told you to create the hive table with serde

Syntax:

create table <dbname>.<tablename)(a string,b string,c string,d string,e string,f string,g string,h string,i string,j string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";