Support Questions

geko · ‎04-17-2014

Hi,

I have a parquet based table and can successfully select it within Hive and Impala,

but if I want to select from that table in shark, I receive the error:

14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source tables
FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
    at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
    at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)

Where is this class included? what to do/link/install/configure to get rid of the error?

I am using CDH5, parquet libs are in /opt/cloudera/parcels/CDH/lib/parquet

thanks in advance, Gerd

geko · ‎04-17-2014

Hi,

the previous error at trying to access a parquet based table via shark "java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat" has been resolved by adding
parquet-hive-bundle-1.4.1.jar to shark's lib folder.
Now the Hive metastore can be read successfully (also the parquet based table).

But if I want to select from that table I receive:

org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

This is really strange, since the class org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in the parquet-hive-bundle-1.4.1.jar, too ?!?!

I copied that .jar to both lib folders, shark (/opt/shark/shark-0.9.1/lib) and spark (under /opt/cloudera/parcels...)

...getting more and more confused 😉

any help ?

regards, Gerd