Reply
New Contributor
Posts: 5
Registered: ‎07-28-2015

Hive jars not found on classpath

Hi,

 

I'm trying to access a hive dataset through the Java API. Running my jar gives the following error:

 

org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:hive:test/test_table
Check that JARs for hive datasets are on the classpath
	at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128)
	at org.kitesdk.data.Datasets.load(Datasets.java:103)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at de.dlh.lht.ti.ti54.paa.parrot.out.HDFSOutputWriter.writeFlightValues(HDFSOutputWriter.java:50)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.parseFile(QARParserCoordinator.java:253)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:220)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:198)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.main(QARParserCoordinator.java:114)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Setting the logging level for org.kitesdk.data.spi.Registration to DEBUG gives the following insight:

Registered repository URIs:
	URIPattern{pattern=file:/*path?absolute=true}
	URIPattern{pattern=file:*path}
	URIPattern{pattern=hdfs:/*path?absolute=true}
	URIPattern{pattern=hdfs:*path}
	URIPattern{pattern=webhdfs:/*path?absolute=true}

So the hive URIPattern is indeed not registered. However printing out the classpath from within the Java application via ClassLoader shows, amongst others, the following jars on the classpath:

 

hive-serde-1.1.0-cdh5.5.1.jar
hive-service-1.1.0-cdh5.5.1.jar
hive-accumulo-handler-1.1.0-cdh5.5.1.jar
hive-contrib-1.1.0-cdh5.5.1.jar
hive-common-1.1.0-cdh5.5.1.jar
hive-ant-1.1.0-cdh5.5.1.jar
hive-metastore-1.1.0-cdh5.5.1.jar
hive-shims-0.23-1.1.0-cdh5.5.1.jar
hive-shims-1.1.0-cdh5.5.1.jar
hive-shims-scheduler-1.1.0-cdh5.5.1.jar
hive-shims-common-1.1.0-cdh5.5.1.jar
hive-cli-1.1.0-cdh5.5.1.jar
hive-exec-1.1.0-cdh5.5.1.jar
hive-testutils-1.1.0-cdh5.5.1.jar
hive-hbase-handler-1.1.0-cdh5.5.1.jar
hive-jdbc-1.1.0-cdh5.5.1.jar
hive-jdbc-1.1.0-cdh5.5.1-standalone.jar
hive-beeline-1.1.0-cdh5.5.1.jar
hive-hwi-1.1.0-cdh5.5.1.jar

I've also tried setting the following environment variables after looking at the startup script of the kite-dataset tool:

 

HADOOP_HOME
HADOOP_CLASSPATH
HADOOP_COMMON_HOME
HADOOP_MAPRED_HOME
HBASE_HOME
HIVE_CONF_DIR
HIVE_HOME

These don't seem to be picked up by kite either. Any help in getting this to work would be greatly appreciated.

 

Thanks,

Jasper

Posts: 1,567
Kudos: 289
Solutions: 240
Registered: ‎07-31-2013

Re: Hive jars not found on classpath

You are likely missing the inclusion of kite-data-hive dependency and associated jars in your runtime. The URI is understood by certain classes from this module, and if the classes are absent in runtime then you'll get the error of "Unknown dataset URI pattern: dataset:hive:..."
Backline Customer Operations Engineer
Explorer
Posts: 10
Registered: ‎05-07-2016

Re: Hive jars not found on classpath

Hi Harsh,

 

I get an error "

Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found" when I run a simple query in beeline.

 

When I run the command "add jar /opt/cloudera/parcels/-----", it works fine after this. Do I need to manually run such commands for all jars? Is there a permanent alternative?

 

I use CDH 5.10

Regards

Siddesh

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.