Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ERROR FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputForma

avatar
Expert Contributor

I am receiving following error while inserting data into parquet table

 

FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat

 

 

Please help me to resolve this issue, also not able to find out the jar that is missing for this class

1 ACCEPTED SOLUTION

avatar

Looks like you are running into this issue:

https://issues.cloudera.org/browse/IMPALA-2048

 

I'd suggest you give the workaround a try.

 

We've identified the issue and fixed it.

View solution in original post

10 REPLIES 10

avatar
Expert Contributor

I am using CDH -5.4.2, 

avatar
New Contributor

Hi,

 

Does your table have partitions? What is the flow of creation of the tables? Could you give some more details, We are facing same issue in two different flows. There is a reference of this bug having been fixed in version 4.7.x, 

 

We do not have this error in CDH 5.2.x but seeing this error atleast since 5.4.0

 

If you can share some more informatin we could combine this issue together to request Cloudera's attention to this.

 

-Sreesankar

 

avatar
Expert Contributor

Hi ,

 

Yes we have partitions on tables, it looks like after you run invaliate metadata on impala followed by compute stats we are getting this error. The tables that are partitioned and in parquet file is in parquet file format  and once doing compute stats on impala we are not able to  access them from hive. 

 

In our previous CDH version which was 5.3.2 it was working fine and tables were accessible from both hive and impala after running compute stats too. But in the latest version of CDH 5.4.2 it looks like a bug, If cloudera can help us it will be a great plus to stick too CDH 5.4.2 or else we have to think for other options.

avatar
New Contributor

Hi,

 

I am having this same issue using Cloudera 5.4.2. The error appears even without running the invalidate metadata or compute stats from impala. I have dropped and recreated the table even with a different name to just be sure there wasn't some residual metadata that was causing this. I am using partitions too.

 

I ended up having to create the table and partitions and insert data using impala. Surprisingly, hive cannot select data from the table either. Same error.

avatar

Looks like you are running into this issue:

https://issues.cloudera.org/browse/IMPALA-2048

 

I'd suggest you give the workaround a try.

 

We've identified the issue and fixed it.

avatar
New Contributor

Okay first of all my problem is exactly the same. What had somehow escaped me is that the insert I was running was actually inserting records selected from another parquet table. The error of course was from hive being unable to read the source parquet table.

 

Tried the workaround and I can confirm it is working. It took me a while to realize that you have to recreate the partitions in the new table otherwise you get no output. It does introduce some warnings, though, as below:

 

hive> select * from tbl_ptr limit 1;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
<RECORD OUTPUT OKAY>
Time taken: 0.387 seconds, Fetched: 1 row(s)
hive> quit;
Jul 3, 2015 10:36:20 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jul 3, 2015 10:36:20 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 17636531 records.
Jul 3, 2015 10:36:20 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jul 3, 2015 10:36:21 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 755 ms. row count = 17636531

avatar
Expert Contributor

It works with the workaround, but it will be good if we can use tables and data from both impala and hive like we used to in our previous versions. It will avoid us from creating multiple similar tables, one for hive and another for impala.

 

 

If it is fixed then its a great thing, so now we need to reinstall CDH 5.4.2 again to resolve this issue ?

avatar

I agree completely that this is a critical issue, and I appreciate your patience in this matter.

 

The fix will be shipped as part of CDH 5.4.4 tentatively scheduled for the beginning of August.

 

avatar
Explorer

I am still seeing this issue with CDH 5.4.4-1.cdh5.4.4.p0.4

 

Is this still an issue or should it be resolved with my version of CDH?

 

Thanks,

Tom