- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
SerDe parquet.hive.serde.ParquetHiveSerDe does not exist
Created on ‎09-04-2013 04:29 PM - edited ‎09-16-2022 08:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
G'Day all,
We've moved from Impala 1.1 to 1.1.1 and are evaluating the use of Parquet. Unfortunately after creating the table it is unable to be used with the following error:
org.apache.hadoop.hive.serde2.SerDeException SerDe parquet.hive.serde.ParquetHiveSerDe does not exist
when attempting to access (or even drop) the table using impala-shell
Details of installation:
Clean install to Impala 1.1
Package upgrade to 1.1.1
parquet-hive-1.0.jar exists in
/opt/cloudera/parcels/IMPALA-1.1.1-1.p0.17/lib/impala/lib/
/opt/cloudera/parcels/CDH-4.3.1-1.cdh4.3.1.p0.110/lib/hive/lib
Has anyone else encountered this issue?
Derek
Created ‎11-16-2013 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved my problem.
My problem was two fold... I had a lingering old install of hive which was what my shell was defaulting to. I removed the second bad install of hive. The second problem is that the latest version of cdh seems to be broken for parquet. There's a bunch of missing jars. I followed the following guide and I am now able to use parquet files.
http://analog99.wordpress.com/2013/01/07/setting-up-stats-db-in-hive/
Created ‎09-09-2013 02:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Derek -
In order to allow compatibilty between Impala and Hive, Parquet backed tables after 1.1.1 contain additional metadata about the file format. Run the following commands (in Hive) to update your metadata.
ALTER TABLE table_name SET SERDE 'parquet.hive.serde.ParquetHiveSerDe'; ALTER TABLE table_name SET FILEFORMAT INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
Created ‎09-18-2013 12:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am getting the following error when I try to update the hive metadata:
Exception in thread "main" java.lang.NoClassDefFoundError: parquet/Log
at parquet.hive.DeprecatedParquetInputFormat.<clinit>(DeprecatedParquetInputFormat.java:63)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:299)
at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:96)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:966)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1105)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerde(DDLSemanticAnalyzer.java:1033)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:209)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:457)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:929)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: parquet.Log
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Created ‎10-16-2013 12:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey there -
I am really sorry for not replying back, I forgot to subscribe to thread. Have you got this issue worked out already? If so, it would be nice to get the fix so we can close the loop.
If not, I suspect its because you need more of the Parquet JARs in that CDH directory. Is this exception being thrown from the Hive shell or the Impala shell? Run the following command for me, so I can see where the JARs currently are.
$ find /opt/cloudera/parcels/CDH/lib/hive/ -name "parquet*.jar"
Also, is this using the Hive Metastore server? Make sure to restart that, as well as Impala and try again.
Created ‎11-13-2013 10:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I’m facing the same problem. I installed Impala 1.1.1 and CDH4.1.2.
I created a table in the impala-shell using the Parquet file format but I am not able to insert data into into.
impala> create table data_parquet like data stored as parquetfile; impala> insert into data_parquet select * from data; Query: insert into data_parquet select * from data ERROR: AnalysisException: Target table 'default.data_parquet' is incompatible with SELECT / PARTITION expressions. Expression 'data.payload_user_uid' (type: BIGINT) is not compatible with column 'payload_user_type' (type: STRING)
I tried the suggestion on this topic.
hive> alter table data_parquet set serde 'parquet.hive.serde.ParquetHiveSerDe'; FAILED: RuntimeException java.lang.ClassNotFoundException: parquet.hive.DeprecatedParquetInputFormat
I have the parquet jar in the hive lib directory.
$ find /opt/cloudera/parcels/CDH/lib/hive/lib -name "parquet*.jar" /opt/cloudera/parcels/CDH/lib/hive/lib/parquet-hive-1.0.jar
Created ‎11-16-2013 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I solved my problem.
My problem was two fold... I had a lingering old install of hive which was what my shell was defaulting to. I removed the second bad install of hive. The second problem is that the latest version of cdh seems to be broken for parquet. There's a bunch of missing jars. I followed the following guide and I am now able to use parquet files.
http://analog99.wordpress.com/2013/01/07/setting-up-stats-db-in-hive/
Created ‎06-28-2015 10:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you publish the list of jars you used to solve this issue. I am facing the same issue in version 5.4.0 and read it occure in 5.4.2 as-well.
-Sreesankar
Created ‎06-29-2015 12:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe you may be hitting https://issues.cloudera.org/browse/IMPALA-2048 or a variant thereof. Please have a look at that JIRA, it includes a workaround that may be acceptable to you.
Created ‎07-03-2015 02:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This seems to be a bug and hope we get the fix included in cdh 5.4.3.
one of the work arounds observed was to
Have 2 hive table pointing to the same locationg nad have onle of the table accessed through Impala while thye other is accessed through Hive.
