Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

MRUnit wih DistributedCache support

avatar
New Member

Hi All ,

I have a simple mapper which read some data from a log file and do some join operation with a another file data and send that combined output to reducer for further processing.

In mapper I have used DistributedCache as the file is small one. Its working properly.

Now I have to write some MRUnit test cases for that mapper. Can any one help me out with some code example how to write MRUnit with DistributedCache support.

I am using Hadoop2 and MRUnit version is as follows ....

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier> 
</dependency>

In Driver class I have added for DistributedCache (this is just to explain how I added cache in MR)Job job = Job.getInstance(conf);job.setJarByClass(ReportDriver.class);

job.setJobName("Report");
job.addCacheFile(new Path("zone.txt").toUri());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(ReportMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(ReportReducer.class);
job.setNumReduceTasks(3);
//job.setCombinerClass(ReportReducer.class);
logger.info("map job started ---------------");
System.exit(job.waitForCompletion(true) ? 0 : 1);

In Mapper class I am fetching the cases file like this ....

@Override
protected void setup(Context context) throws IOException, InterruptedException 
{
        URI[] localPaths = context.getCacheFiles();
}

Please help me out if any one use DistributedCache with MRUnit with some code example...

Thanks a lot ....

1 ACCEPTED SOLUTION

avatar
Master Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

View solution in original post

6 REPLIES 6

avatar
Master Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

avatar
New Member

Hi , Thanks for you reply....

ohhh.... is it , I did not know that. Is there any other unit testing tool do you know for M/R job ?

I gone through the URL also , its helpful but I also experienced the same issue as reported in stackoverlow "Null Pointer" exception when mapper trying to get the path/URI from configuration.

avatar
Master Mentor

unfortunately there are none that I know of. We do have one engineer who's providing support for his unit-testing framework and it has mapreduce unit testing capabilities, though I am not sure if distributedcache testing is supported. https://github.com/sakserv/hadoop-mini-clusters

avatar
Master Mentor

@Biswajit Chakraborty I can't believe I didn't think of this earlier, take a look at https://apache.googlesource.com/mrunit/+/e43ef01dd1199a7eb0963edbf05258a8609bf0dc/src/test/java/org/... This is MRUnit's own DistributedCache test with examples how to set it up.

avatar
New Member

yapee......Artem , Thanks a lot 🙂 ......

You really saved my day.... Thanks again .....

avatar
Master Mentor

Please consider publishing an article on this, others will find it useful as it's not an obvious find.