Created 05-29-2018 09:39 PM
Hello , I created an external hive table on a parquet file and when I run a select * , I am seeing below error
I am running hive on EMR cluster - Hive 2.3.2-amzn-2. I verified that all the fields exist in the parquet file
Did anyone encounter this issue?
Any suggestions would be appreciated
does not contain requested field: optional boolean prepaidFlag:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:499', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:307', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:878', 'sun.reflect.GeneratedMethodAccessor15:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:498', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1836', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy35:fetchResults::-1', 'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:559', 'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:751', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1717', 'org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1702', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.io.IOException:java.lang.IllegalStateException: Group type
Created 05-30-2018 05:28 PM
Please share the Create statement from hive table.
Created 05-30-2018 05:49 PM
Hi Venkat, Below is the create table statement.
The parquet file has all the columns listed and the data types match the schema
Create external table Hive_Parquet_Test( statement_Id int, statement_MessageId string, prepaidFlag boolean, item_Count int, first_Name string, last_Name string ) STORED AS PARQUET LOCATION 's3://bucket_name/hive_parq_test'
Created 05-30-2018 06:07 PM
Did the source file is also in parquet file format?
Created 05-30-2018 06:30 PM
Yes, it is parquet. here is some additional information
Environment:
Created 05-30-2018 06:35 PM
Get the schema by using parquet-tool schema filename in hdfs and use that schema to build external table in hive and use parquet serde at time of creation of hive table.
Created 05-30-2018 07:39 PM
thank you again for your inputs.
yes, I extracted the schema from parquet file and and created external table
I am not clear on your comment " use parquet serde at time of creation of hive table." based on the hive documentation, I am using, STORED AS PARQUET (Hive 2.3.2-amzn-2)
Also I am not sure if the conversion using fast-parquet python library is causing it or if this is a bug in hive
Created 05-30-2018 10:26 PM
I told you to create the hive table with serde
Syntax:
create table <dbname>.<tablename)(a string,b string,c string,d string,e string,f string,g string,h string,i string,j string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";