Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Java Program to query a Kite Dataset

Highlighted

Java Program to query a Kite Dataset

Rising Star

Hello,

 
I am reading my first tutorial of kite sdk programming and have written this program (based on the grouplens movie dataset).
 
I have already created the dataset and loaded it with data using the kite-dataset utility. Now I am writing a java program just to query it
 
here is java code
 
package com.abhishek.HelloKite;
// hadoop
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.conf.Configured;

// kite
import org.kitesdk.data.Dataset;
import org.kitesdk.data.Datasets;
import org.kitesdk.data.DatasetReader;

// avro
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericData.Record;

/**
 * Hello world!
 *
 */
public class App extends Configured implements Tool 
{
	public int run(String[] args) throws Exception {
		Dataset<Record> movies = Datasets.load("dataset:hive?dataset=movies", Record.class);
		
		DatasetReader<Record> reader = null;
		
		try {
			reader = movies.newReader();
			for(GenericRecord movie : movies.newReader()) {
				System.out.println(movie);
			}				
		} finally {
			if (reader != null) reader.close();
		}
		return 0;		
	}
	
	public static void main(String[] args) throws Exception {
		int rc = ToolRunner.run(new App(), args);
		System.exit(rc);
	}
}

 

Here is my pom.xml
 
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.abhishek</groupId>
	<artifactId>HelloKite</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>HelloKite</name>
	<url>http://maven.apache.org</url>

	<parent>
		<groupId>org.kitesdk</groupId>
		<artifactId>kite-app-parent-cdh5</artifactId>
		<version>0.17.1</version>
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	</properties>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>2.5.1</version>
				<configuration>
					<source>1.6</source>
					<target>1.6</target>
					<compilerArgument>-Xlint:unchecked</compilerArgument>
					<showDeprecation>true</showDeprecation>
					<showWarnings>true</showWarnings>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-failsafe-plugin</artifactId>
				<version>2.18</version>
				<executions>
					<execution>
						<goals>
							<goal>integration-test</goal>
							<goal>verify</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>

	<dependencies>
		<dependency>
			<groupId>log4j</groupId>
			<artifactId>log4j</artifactId>
			<version>${hadoop.log4j.version}</version>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-log4j12</artifactId>
			<version>${hadoop.slf4j.version}</version>
		</dependency>
		<dependency>
			<groupId>org.kitesdk</groupId>
			<artifactId>kite-hadoop-cdh5-dependencies</artifactId>
			<version>${kite.version}</version>
			<type>pom</type>
			<scope>compile</scope> <!-- provide Hadoop dependencies -->
		</dependency>
		<dependency>
			<groupId>org.apache.hive</groupId>
			<artifactId>hive-exec</artifactId>
			<version>${kite.hive.version}</version>
			<scope>compile</scope> <!-- provide Hive dependencies -->
		</dependency>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.10</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.kitesdk</groupId>
			<artifactId>kite-minicluster</artifactId>
			<version>${kite.version}</version>
			<scope>test</scope>
		</dependency>
	</dependencies>

</project>

 

 

But when I try to build a final jar file (which I will copy to my hadoop cluster and try to execute). I get this error
 
[INFO] Scanning for projects...
[INFO] 
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building HelloKite 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- kite-maven-plugin:0.17.1:package-app (default-cli) @ HelloKite ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.044 s
[INFO] Finished at: 2014-12-25T16:30:03-06:00
[INFO] Final Memory: 32M/617M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.kitesdk:kite-maven-plugin:0.17.1:package-app (default-cli) on project HelloKite: The parameters 'toolClass' for goal org.kitesdk:kite-maven-plugin:0.17.1:package-app are missing or invalid -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
 
 
What is going wrong? Can you help me in resolving this?
1 REPLY 1

Re: Java Program to query a Kite Dataset

Contributor
Which maven command are you running to build the jar?
Don't have an account?
Coming from Hortonworks? Activate your account here