Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.




I have a parquet based table and can successfully select it within Hive and Impala,

but if I want to select from that table in shark, I receive the error:


14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for source tables
FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException:
14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: java.lang.RuntimeException(java.lang.ClassNotFoundException:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
    at org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(
    at org.apache.hadoop.hive.ql.metadata.Table.<init>(
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(


Where is this class included? what to do/link/install/configure to get rid of the error?

I am using CDH5, parquet libs are in /opt/cloudera/parcels/CDH/lib/parquet


thanks in advance, Gerd


Re: ClassNotFoundException:



the previous error at trying to access a parquet based table via shark "java.lang.ClassNotFoundException:" has been resolved by adding
parquet-hive-bundle-1.4.1.jar to shark's lib folder.
Now the Hive metastore can be read successfully (also the parquet based table).

But if I want to select from that table I receive:

org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException:
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

This is really strange, since the class is included in the parquet-hive-bundle-1.4.1.jar, too ?!?!

I copied that .jar to both lib folders, shark (/opt/shark/shark-0.9.1/lib) and spark (under /opt/cloudera/parcels...)

...getting more and more confused ;)

any help ?

regards, Gerd