Created on 06-27-2015 04:53 AM - edited 09-16-2022 02:32 AM
I am receiving following error while inserting data into parquet table
FAILED: SemanticException Class not found: com.cloudera.impala.hive.serde.ParquetInputFormat
Please help me to resolve this issue, also not able to find out the jar that is missing for this class
Created 06-30-2015 10:42 PM
Looks like you are running into this issue:
https://issues.cloudera.org/browse/IMPALA-2048
I'd suggest you give the workaround a try.
We've identified the issue and fixed it.
Created 06-27-2015 04:54 AM
I am using CDH -5.4.2,
Created 06-28-2015 10:32 PM
Hi,
Does your table have partitions? What is the flow of creation of the tables? Could you give some more details, We are facing same issue in two different flows. There is a reference of this bug having been fixed in version 4.7.x,
We do not have this error in CDH 5.2.x but seeing this error atleast since 5.4.0
If you can share some more informatin we could combine this issue together to request Cloudera's attention to this.
-Sreesankar
Created 06-28-2015 10:42 PM
Hi ,
Yes we have partitions on tables, it looks like after you run invaliate metadata on impala followed by compute stats we are getting this error. The tables that are partitioned and in parquet file is in parquet file format and once doing compute stats on impala we are not able to access them from hive.
In our previous CDH version which was 5.3.2 it was working fine and tables were accessible from both hive and impala after running compute stats too. But in the latest version of CDH 5.4.2 it looks like a bug, If cloudera can help us it will be a great plus to stick too CDH 5.4.2 or else we have to think for other options.
Created 06-30-2015 02:25 AM
Hi,
I am having this same issue using Cloudera 5.4.2. The error appears even without running the invalidate metadata or compute stats from impala. I have dropped and recreated the table even with a different name to just be sure there wasn't some residual metadata that was causing this. I am using partitions too.
I ended up having to create the table and partitions and insert data using impala. Surprisingly, hive cannot select data from the table either. Same error.
Created 06-30-2015 10:42 PM
Looks like you are running into this issue:
https://issues.cloudera.org/browse/IMPALA-2048
I'd suggest you give the workaround a try.
We've identified the issue and fixed it.
Created on 07-03-2015 01:43 AM - edited 07-03-2015 01:46 AM
Okay first of all my problem is exactly the same. What had somehow escaped me is that the insert I was running was actually inserting records selected from another parquet table. The error of course was from hive being unable to read the source parquet table.
Tried the workaround and I can confirm it is working. It took me a while to realize that you have to recreate the partitions in the new table otherwise you get no output. It does introduce some warnings, though, as below:
hive> select * from tbl_ptr limit 1;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
<RECORD OUTPUT OKAY>
Time taken: 0.387 seconds, Fetched: 1 row(s)
hive> quit;
Jul 3, 2015 10:36:20 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jul 3, 2015 10:36:20 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 17636531 records.
Jul 3, 2015 10:36:20 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
Jul 3, 2015 10:36:21 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 755 ms. row count = 17636531
Created 07-07-2015 12:05 PM
It works with the workaround, but it will be good if we can use tables and data from both impala and hive like we used to in our previous versions. It will avoid us from creating multiple similar tables, one for hive and another for impala.
If it is fixed then its a great thing, so now we need to reinstall CDH 5.4.2 again to resolve this issue ?
Created 07-08-2015 08:30 AM
I agree completely that this is a critical issue, and I appreciate your patience in this matter.
The fix will be shipped as part of CDH 5.4.4 tentatively scheduled for the beginning of August.
Created 08-10-2015 10:29 AM
I am still seeing this issue with CDH 5.4.4-1.cdh5.4.4.p0.4
Is this still an issue or should it be resolved with my version of CDH?
Thanks,
Tom