Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Map Reduce with Kite ?

Map Reduce with Kite ?

Explorer

Hi,

 

I am new to Hadoop and started learning it going through online tutorials. Recently I started learning Kite and thought of integrating it with MapReduce to store Avro file.

 

BTW, I am using DataSet Module . I have few quetions, can anyone help we this.

 

1. how can we integrate mapreduce with dataset module like writing mapper, reducer classes.   In HelloKite.java, we are writing some hello statements but how can it handle if it's big data. where the mappers and reducers are run for it?  It's simply creating avro file and how it can handle if data keeps on increasing.

 

2. Reading dataset is also getting data from avro files. what it would be the case if size of the avro file is huge? Where the mappers and reducers for this? wont it take more time to display output.

 

3. To handle with large data, I tried to integrate dataset module with wordcount example and size of the input file is 1GB. My Sample code only created folder but the data is never written to it.

 

package org.kitesdk.examples.data;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.kitesdk.data.Dataset;
import org.kitesdk.data.DatasetDescriptor;
import org.kitesdk.data.DatasetWriter;
import org.kitesdk.data.Datasets;

/*
* To define a reduce function for your MapReduce job, subclass
* the Reducer class and override the reduce method.
* The class definition requires four parameters:
* The data type of the input key (which is the output key type
* from the mapper)
* The data type of the input value (which is the output value
* type from the mapper)
* The data type of the output key
* The data type of the output value
*/
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
DatasetWriter<WordSchema> writer = null;
Dataset<WordSchema> wordDataSet =null;
DatasetDescriptor descriptor=null;
WordSchema data =null;
protected void setup(Context context) throws IOException,
InterruptedException {
String datasetUri = "dataset:file:/home/training/kitewordcount";

// Create a dataset of Hellos
descriptor = new DatasetDescriptor.Builder()
.schema(WordSchema.class).build();
wordDataSet = Datasets.create(datasetUri, descriptor, WordSchema.class);

// Write some word counts in to the dataset

}
protected void cleanUp(Context context) throws IOException,
InterruptedException {
if (writer != null) {
writer.close();
}
}
/*
* The reduce method runs once for each key received from
* the shuffle and sort phase of the MapReduce framework.
* The method receives a key of type Text, a set of values of type
* IntWritable, and a Context object.
*/
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;

/*
* For each value in the set of values passed to us by the mapper:
*/
for (IntWritable value : values) {

/*
* Add the value to the word count counter for this key.
*/
wordCount += value.get();
}

/*
* Call the write method on the Context object to emit a key
* and a value from the reduce method.
*/
context.write(key, new IntWritable(wordCount));
try {
writer = wordDataSet.newWriter();
data = new WordSchema(key.toString(),Integer.toString(wordCount));
writer.write(data);

}catch( Exception e)
{
System.out.println("Exception occured "+e.getStackTrace());
}
}
}
class WordSchema{
private String word;
private String wordCount;
public WordSchema(String word, String count)
{
this.word=word;
this.wordCount=count;
}
public WordSchema()
{
this.word=word;
this.wordCount=wordCount;
}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public String getWordCount() {
return wordCount;
}
public void setWordCount(String wordCount) {
this.wordCount = wordCount;
}
}

 

Eagerly waiting to hear inputs. Hope someone can help me with the questions I have.

 

Thanks,

Azzu

1 REPLY 1
Highlighted

Re: Map Reduce with Kite ?

Contributor

Hi Azzu!

 

Kite has a dedicated module for reading/writing Kite datasets using MapReduce. We don't have an example that demonstrates MapReduce support, but perhaps taking a look at this test will get you started:

 

https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-mapreduce/src/test/java/org/kitesdk...

 

You may also want to check out our Crunch integration. Crunch provides a higher-level API for writing data pipelines and supports both MapReduce and Spark as an execution engine. We have a good example of using Crunch for processing data in the demo example:

 

https://github.com/kite-sdk/kite-examples/tree/master/demo/demo-crunch

 

Let us know if you need more resources to get started.

 

-Joey