Support Questions

Find answers, ask questions, and share your expertise

MRUnit wih DistributedCache support

avatar
Contributor

Hi All ,

I have a simple mapper which read some data from a log file and do some join operation with a another file data and send that combined output to reducer for further processing.

In mapper I have used DistributedCache as the file is small one. Its working properly.

Now I have to write some MRUnit test cases for that mapper. Can any one help me out with some code example how to write MRUnit with DistributedCache support.

I am using Hadoop2 and MRUnit version is as follows ....

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier> 
</dependency>

In Driver class I have added for DistributedCache (this is just to explain how I added cache in MR)Job job = Job.getInstance(conf);job.setJarByClass(ReportDriver.class);

job.setJobName("Report");
job.addCacheFile(new Path("zone.txt").toUri());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(ReportMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(ReportReducer.class);
job.setNumReduceTasks(3);
//job.setCombinerClass(ReportReducer.class);
logger.info("map job started ---------------");
System.exit(job.waitForCompletion(true) ? 0 : 1);

In Mapper class I am fetching the cases file like this ....

@Override
protected void setup(Context context) throws IOException, InterruptedException 
{
        URI[] localPaths = context.getCacheFiles();
}

Please help me out if any one use DistributedCache with MRUnit with some code example...

Thanks a lot ....

1 ACCEPTED SOLUTION

avatar
Master Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

View solution in original post

6 REPLIES 6

avatar
Master Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

avatar
Contributor

Hi , Thanks for you reply....

ohhh.... is it , I did not know that. Is there any other unit testing tool do you know for M/R job ?

I gone through the URL also , its helpful but I also experienced the same issue as reported in stackoverlow "Null Pointer" exception when mapper trying to get the path/URI from configuration.

avatar
Master Mentor

unfortunately there are none that I know of. We do have one engineer who's providing support for his unit-testing framework and it has mapreduce unit testing capabilities, though I am not sure if distributedcache testing is supported. https://github.com/sakserv/hadoop-mini-clusters

avatar
Master Mentor

@Biswajit Chakraborty I can't believe I didn't think of this earlier, take a look at https://apache.googlesource.com/mrunit/+/e43ef01dd1199a7eb0963edbf05258a8609bf0dc/src/test/java/org/... This is MRUnit's own DistributedCache test with examples how to set it up.

avatar
Contributor

yapee......Artem , Thanks a lot 🙂 ......

You really saved my day.... Thanks again .....

avatar
Master Mentor

Please consider publishing an article on this, others will find it useful as it's not an obvious find.