Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Hive error on Partitioned Parquet table

Hive error on Partitioned Parquet table

Explorer

Error while compiling statement: FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat

 

Hive returns this error when querying one of the tables we have.  It is a partitiioned table stored as Parquet.  All other tables we have return fine with Hive.  I have searched and am not seeing this error in docs or through google searches.

 

Thank you.

11 REPLIES 11

Re: Hive error on Partitioned Parquet table

Cloudera Employee
Did you create this table using Impala? Impala has a bug when creating a parquet table, and then attempting to read it from Hive.

You can change the table inputformat using this command:

hive> alter table MYTABLE set fileformat parquet;

It will change the inputformat to the correct inputformat that Hive can read. Also, the hive inputformat should be compatible on Impala.

Re: Hive error on Partitioned Parquet table

Explorer
I did create this table using Impala. But I did so with all my other tables as well.

I did go ahead and run that command, yet it did not fix the error.

Re: Hive error on Partitioned Parquet table

Cloudera Employee
Could you run 'desc formatted MYTABLE' on the table that you run the command?

Maybe Hive is not changing the inputformat if the command is already a parquet table.

Re: Hive error on Partitioned Parquet table

Explorer
Here is what I see in the inputformat when I run the describe:

org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Re: Hive error on Partitioned Parquet table

Explorer

Hi

 

I am getting the same error. Strangely I havent created the table in question using Impala and "show create table" shows no sign of the Impala serde

 

Deenar

 

hive> select * from mr_event;
FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat
hive> show create table mr_event;
OK
CREATE EXTERNAL TABLE `mr_event`(
`created` bigint,
`sourced_from` string,
`event_name` string,
`major_version` int,
`minor_version` int,
`feed_code` string,
`action` string,
`status` string,
`cob_date_key` int,
`core_batch_timestamp` string,
`core_batch_guid` string,
`core_audit` string)
PARTITIONED BY (
`cob_date` string COMMENT 'Partition column derived from \'null\' column, generated by Kite.')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event'
TBLPROPERTIES (
'STATS_GENERATED_VIA_STATS_TASK'='true',
'avro.schema.url'='hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event/_metadata/schemas/_1.avsc',
'keycolumns'='created,sourced_from',
'kite.compression.type'='snappy',
'kite.partition.expression'='provided(\"cob_date\", \"string\")',
'last_modified_by'='hive',
'last_modified_time'='1445557882',
'numRows'='8',
'qsmetadata'='{\"dataSet\":\"mr_event\",\"source\":[\"Jupiter\",\" eRisk\"],\"type\":\"fact\",\"partitionByCob\":true,\"whereClause\":\"COB_DATE = date:cobDate AND FEED_CODE IN :feed_code_list\",\"cubeProcess\":\"Add\"}',

Re: Hive error on Partitioned Parquet table

Explorer

Apparently gathering statistics on Impala caused this issue. SHOW CREATE TABLE shows the serde's fine, but individual parititions have the Impala Serde. Any workarounds ?

 

hive> desc formatted mr_event PARTITION (cob_date=20151026);
OK
# col_name data_type comment

created bigint
sourced_from string
event_name string
major_version int
minor_version int
feed_code string
action string
status string
cob_date_key int
core_batch_timestamp string
core_batch_guid string
core_audit string

# Partition Information
# col_name data_type comment

cob_date string Partition column derived from 'null' column, generated by Kite.

# Detailed Partition Information
Partition Value: [20151026]
Database: marketriskdata_dev4
Table: mr_event
CreateTime: Fri Oct 23 00:54:47 BST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Location: hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event/cob_date=20151026
Partition Parameters:
COLUMN_STATS_ACCURATE true
impala_intermediate_stats_chunk0 HBYAAAA=
impala_intermediate_stats_num_chunks 1
numFiles 0
numRows 0
rawDataSize -1
totalSize 0
transient_lastDdlTime 1445558087

# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: com.cloudera.impala.hive.serde.ParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Time taken: 0.192 seconds, Fetched: 46 row(s)

Re: Hive error on Partitioned Parquet table

Contributor

Yes I believe its a result of https://issues.cloudera.org/browse/IMPALA-2048.

 

There are some workarounds there.  Alter table to the set the fileformat to impala fixes one part of it, if there are partitions then you have to run this as well

 

alter table $tablename partition($partitionname) set fileformat parquet

 

This has to be run on all the partitions, unfortunately.

 

Highlighted

Re: Hive error on Partitioned Parquet table

Explorer

Hi,

 

We upgraded to CDH 5.5.1 and still facing this issue with Parquet Hive tables. Our table is having few years of partitioned data by certain columns and do we still need to alter for all partitions or any update to this issue?

 

Thanks

Amit

Re: Hive error on Partitioned Parquet table

Contributor

I guess its fixed for newer Impala but wont help your case.

 

Did you check IMPALA-2048, looks like someone put in the bottom comments some steps to do it a bit quicker?