04-01-2016 04:07 AM
Hi,
I'm trying to access a hive dataset through the Java API. Running my jar gives the following error:
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:hive:test/test_table Check that JARs for hive datasets are on the classpath at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128) at org.kitesdk.data.Datasets.load(Datasets.java:103) at org.kitesdk.data.Datasets.load(Datasets.java:165) at de.dlh.lht.ti.ti54.paa.parrot.out.HDFSOutputWriter.writeFlightValues(HDFSOutputWriter.java:50) at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.parseFile(QARParserCoordinator.java:253) at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:220) at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.crawlDirectory(QARParserCoordinator.java:198) at de.dlh.lht.ti.ti54.paa.parrot.main.QARParserCoordinator.main(QARParserCoordinator.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Setting the logging level for org.kitesdk.data.spi.Registration to DEBUG gives the following insight:
Registered repository URIs: URIPattern{pattern=file:/*path?absolute=true} URIPattern{pattern=file:*path} URIPattern{pattern=hdfs:/*path?absolute=true} URIPattern{pattern=hdfs:*path} URIPattern{pattern=webhdfs:/*path?absolute=true}
So the hive URIPattern is indeed not registered. However printing out the classpath from within the Java application via ClassLoader shows, amongst others, the following jars on the classpath:
hive-serde-1.1.0-cdh5.5.1.jar hive-service-1.1.0-cdh5.5.1.jar hive-accumulo-handler-1.1.0-cdh5.5.1.jar hive-contrib-1.1.0-cdh5.5.1.jar hive-common-1.1.0-cdh5.5.1.jar hive-ant-1.1.0-cdh5.5.1.jar hive-metastore-1.1.0-cdh5.5.1.jar hive-shims-0.23-1.1.0-cdh5.5.1.jar hive-shims-1.1.0-cdh5.5.1.jar hive-shims-scheduler-1.1.0-cdh5.5.1.jar hive-shims-common-1.1.0-cdh5.5.1.jar hive-cli-1.1.0-cdh5.5.1.jar hive-exec-1.1.0-cdh5.5.1.jar hive-testutils-1.1.0-cdh5.5.1.jar hive-hbase-handler-1.1.0-cdh5.5.1.jar hive-jdbc-1.1.0-cdh5.5.1.jar hive-jdbc-1.1.0-cdh5.5.1-standalone.jar hive-beeline-1.1.0-cdh5.5.1.jar hive-hwi-1.1.0-cdh5.5.1.jar
I've also tried setting the following environment variables after looking at the startup script of the kite-dataset tool:
HADOOP_HOME HADOOP_CLASSPATH HADOOP_COMMON_HOME HADOOP_MAPRED_HOME HBASE_HOME HIVE_CONF_DIR HIVE_HOME
These don't seem to be picked up by kite either. Any help in getting this to work would be greatly appreciated.
Thanks,
Jasper
04-06-2016 10:18 AM
07-19-2017 09:44 AM
Hi Harsh,
I get an error "
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found" when I run a simple query in beeline.
When I run the command "add jar /opt/cloudera/parcels/-----", it works fine after this. Do I need to manually run such commands for all jars? Is there a permanent alternative?
I use CDH 5.10
Regards
Siddesh