Reply
Explorer
Posts: 7
Registered: ‎05-08-2015

Hive error on Partitioned Parquet table

Error while compiling statement: FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat

 

Hive returns this error when querying one of the tables we have.  It is a partitiioned table stored as Parquet.  All other tables we have return fine with Hive.  I have searched and am not seeing this error in docs or through google searches.

 

Thank you.

Cloudera Employee
Posts: 16
Registered: ‎01-27-2015

Re: Hive error on Partitioned Parquet table

Did you create this table using Impala? Impala has a bug when creating a parquet table, and then attempting to read it from Hive.

You can change the table inputformat using this command:

hive> alter table MYTABLE set fileformat parquet;

It will change the inputformat to the correct inputformat that Hive can read. Also, the hive inputformat should be compatible on Impala.
Explorer
Posts: 7
Registered: ‎05-08-2015

Re: Hive error on Partitioned Parquet table

I did create this table using Impala. But I did so with all my other tables as well.

I did go ahead and run that command, yet it did not fix the error.
Highlighted
Cloudera Employee
Posts: 16
Registered: ‎01-27-2015

Re: Hive error on Partitioned Parquet table

Could you run 'desc formatted MYTABLE' on the table that you run the command?

Maybe Hive is not changing the inputformat if the command is already a parquet table.
Explorer
Posts: 7
Registered: ‎05-08-2015

Re: Hive error on Partitioned Parquet table

Here is what I see in the inputformat when I run the describe:

org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Explorer
Posts: 21
Registered: ‎10-22-2015

Re: Hive error on Partitioned Parquet table

Hi

 

I am getting the same error. Strangely I havent created the table in question using Impala and "show create table" shows no sign of the Impala serde

 

Deenar

 

hive> select * from mr_event;
FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat
hive> show create table mr_event;
OK
CREATE EXTERNAL TABLE `mr_event`(
`created` bigint,
`sourced_from` string,
`event_name` string,
`major_version` int,
`minor_version` int,
`feed_code` string,
`action` string,
`status` string,
`cob_date_key` int,
`core_batch_timestamp` string,
`core_batch_guid` string,
`core_audit` string)
PARTITIONED BY (
`cob_date` string COMMENT 'Partition column derived from \'null\' column, generated by Kite.')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event'
TBLPROPERTIES (
'STATS_GENERATED_VIA_STATS_TASK'='true',
'avro.schema.url'='hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event/_metadata/schemas/_1.avsc',
'keycolumns'='created,sourced_from',
'kite.compression.type'='snappy',
'kite.partition.expression'='provided(\"cob_date\", \"string\")',
'last_modified_by'='hive',
'last_modified_time'='1445557882',
'numRows'='8',
'qsmetadata'='{\"dataSet\":\"mr_event\",\"source\":[\"Jupiter\",\" eRisk\"],\"type\":\"fact\",\"partitionByCob\":true,\"whereClause\":\"COB_DATE = date:cobDate AND FEED_CODE IN :feed_code_list\",\"cubeProcess\":\"Add\"}',

Explorer
Posts: 21
Registered: ‎10-22-2015

Re: Hive error on Partitioned Parquet table

Apparently gathering statistics on Impala caused this issue. SHOW CREATE TABLE shows the serde's fine, but individual parititions have the Impala Serde. Any workarounds ?

 

hive> desc formatted mr_event PARTITION (cob_date=20151026);
OK
# col_name data_type comment

created bigint
sourced_from string
event_name string
major_version int
minor_version int
feed_code string
action string
status string
cob_date_key int
core_batch_timestamp string
core_batch_guid string
core_audit string

# Partition Information
# col_name data_type comment

cob_date string Partition column derived from 'null' column, generated by Kite.

# Detailed Partition Information
Partition Value: [20151026]
Database: marketriskdata_dev4
Table: mr_event
CreateTime: Fri Oct 23 00:54:47 BST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Location: hdfs://nameservice1/warehouse/projects/MERCURY/Risk/QuickSilver/MarketRiskData_DEV4/mr_event/cob_date=20151026
Partition Parameters:
COLUMN_STATS_ACCURATE true
impala_intermediate_stats_chunk0 HBYAAAA=
impala_intermediate_stats_num_chunks 1
numFiles 0
numRows 0
rawDataSize -1
totalSize 0
transient_lastDdlTime 1445558087

# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: com.cloudera.impala.hive.serde.ParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Time taken: 0.192 seconds, Fetched: 46 row(s)

Cloudera Employee
Posts: 30
Registered: ‎12-09-2014

Re: Hive error on Partitioned Parquet table

[ Edited ]

Yes I believe its a result of https://issues.cloudera.org/browse/IMPALA-2048.

 

There are some workarounds there.  Alter table to the set the fileformat to impala fixes one part of it, if there are partitions then you have to run this as well

 

alter table $tablename partition($partitionname) set fileformat parquet

 

This has to be run on all the partitions, unfortunately.

 

Explorer
Posts: 6
Registered: ‎11-18-2015

Re: Hive error on Partitioned Parquet table

Hi,

 

We upgraded to CDH 5.5.1 and still facing this issue with Parquet Hive tables. Our table is having few years of partitioned data by certain columns and do we still need to alter for all partitions or any update to this issue?

 

Thanks

Amit

Cloudera Employee
Posts: 30
Registered: ‎12-09-2014

Re: Hive error on Partitioned Parquet table

I guess its fixed for newer Impala but wont help your case.

 

Did you check IMPALA-2048, looks like someone put in the bottom comments some steps to do it a bit quicker?