Reply
New Contributor
Posts: 3
Registered: ‎07-28-2015

Hive jars not found on classpath

Hi,

 

I'm trying to access a hive dataset through the Java API. Running my jar gives the following error:

 

org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:hive:test/test_table
Check that JARs for hive datasets are on the classpath
	at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128)
	at org.kitesdk.data.Datasets.load(Datasets.java:103)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at de.dlh.lht.ti.ti54.paa.parrot.out.HDFSOutputWriter.writeFlightValues(HDFSOutputWriter.java:50)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.parseFile(QARParserCoordinator.java:253)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:220)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:198)
	at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.main(QARParserCoordinator.java:114)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Setting the logging level for org.kitesdk.data.spi.Registration to DEBUG gives the following insight:

Registered repository URIs:
	URIPattern{pattern=file:/*path?absolute=true}
	URIPattern{pattern=file:*path}
	URIPattern{pattern=hdfs:/*path?absolute=true}
	URIPattern{pattern=hdfs:*path}
	URIPattern{pattern=webhdfs:/*path?absolute=true}

So the hive URIPattern is indeed not registered. However printing out the classpath from within the Java application via ClassLoader shows, amongst others, the following jars on the classpath:

 

hive-serde-1.1.0-cdh5.5.1.jar
hive-service-1.1.0-cdh5.5.1.jar
hive-accumulo-handler-1.1.0-cdh5.5.1.jar
hive-contrib-1.1.0-cdh5.5.1.jar
hive-common-1.1.0-cdh5.5.1.jar
hive-ant-1.1.0-cdh5.5.1.jar
hive-metastore-1.1.0-cdh5.5.1.jar
hive-shims-0.23-1.1.0-cdh5.5.1.jar
hive-shims-1.1.0-cdh5.5.1.jar
hive-shims-scheduler-1.1.0-cdh5.5.1.jar
hive-shims-common-1.1.0-cdh5.5.1.jar
hive-cli-1.1.0-cdh5.5.1.jar
hive-exec-1.1.0-cdh5.5.1.jar
hive-testutils-1.1.0-cdh5.5.1.jar
hive-hbase-handler-1.1.0-cdh5.5.1.jar
hive-jdbc-1.1.0-cdh5.5.1.jar
hive-jdbc-1.1.0-cdh5.5.1-standalone.jar
hive-beeline-1.1.0-cdh5.5.1.jar
hive-hwi-1.1.0-cdh5.5.1.jar

I've also tried setting the following environment variables after looking at the startup script of the kite-dataset tool:

 

HADOOP_HOME
HADOOP_CLASSPATH
HADOOP_COMMON_HOME
HADOOP_MAPRED_HOME
HBASE_HOME
HIVE_CONF_DIR
HIVE_HOME

These don't seem to be picked up by kite either. Any help in getting this to work would be greatly appreciated.

 

Thanks,

Jasper

Posts: 1,483
Kudos: 241
Solutions: 225
Registered: ‎07-31-2013

Re: Hive jars not found on classpath

You are likely missing the inclusion of kite-data-hive dependency and associated jars in your runtime. The URI is understood by certain classes from this module, and if the classes are absent in runtime then you'll get the error of "Unknown dataset URI pattern: dataset:hive:..."
Backline Customer Operations Engineer
Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.