Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MRUnit wih DistributedCache support

Solved Go to solution
Highlighted

MRUnit wih DistributedCache support

New Contributor

Hi All ,

I have a simple mapper which read some data from a log file and do some join operation with a another file data and send that combined output to reducer for further processing.

In mapper I have used DistributedCache as the file is small one. Its working properly.

Now I have to write some MRUnit test cases for that mapper. Can any one help me out with some code example how to write MRUnit with DistributedCache support.

I am using Hadoop2 and MRUnit version is as follows ....

<dependency>
    <groupId>org.apache.mrunit</groupId>
    <artifactId>mrunit</artifactId>
    <version>1.1.0</version>
    <classifier>hadoop2</classifier> 
</dependency>

In Driver class I have added for DistributedCache (this is just to explain how I added cache in MR)Job job = Job.getInstance(conf);job.setJarByClass(ReportDriver.class);

job.setJobName("Report");
job.addCacheFile(new Path("zone.txt").toUri());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(ReportMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(ReportReducer.class);
job.setNumReduceTasks(3);
//job.setCombinerClass(ReportReducer.class);
logger.info("map job started ---------------");
System.exit(job.waitForCompletion(true) ? 0 : 1);

In Mapper class I am fetching the cases file like this ....

@Override
protected void setup(Context context) throws IOException, InterruptedException 
{
        URI[] localPaths = context.getCacheFiles();
}

Please help me out if any one use DistributedCache with MRUnit with some code example...

Thanks a lot ....

1 ACCEPTED SOLUTION

Accepted Solutions

Re: MRUnit wih DistributedCache support

Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

6 REPLIES 6

Re: MRUnit wih DistributedCache support

Mentor

Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...

Re: MRUnit wih DistributedCache support

New Contributor

Hi , Thanks for you reply....

ohhh.... is it , I did not know that. Is there any other unit testing tool do you know for M/R job ?

I gone through the URL also , its helpful but I also experienced the same issue as reported in stackoverlow "Null Pointer" exception when mapper trying to get the path/URI from configuration.

Re: MRUnit wih DistributedCache support

Mentor

unfortunately there are none that I know of. We do have one engineer who's providing support for his unit-testing framework and it has mapreduce unit testing capabilities, though I am not sure if distributedcache testing is supported. https://github.com/sakserv/hadoop-mini-clusters

Re: MRUnit wih DistributedCache support

Mentor

@Biswajit Chakraborty I can't believe I didn't think of this earlier, take a look at https://apache.googlesource.com/mrunit/+/e43ef01dd1199a7eb0963edbf05258a8609bf0dc/src/test/java/org/... This is MRUnit's own DistributedCache test with examples how to set it up.

Re: MRUnit wih DistributedCache support

New Contributor

yapee......Artem , Thanks a lot :-) ......

You really saved my day.... Thanks again .....

Re: MRUnit wih DistributedCache support

Mentor

Please consider publishing an article on this, others will find it useful as it's not an obvious find.

Don't have an account?
Coming from Hortonworks? Activate your account here