Created 08-11-2016 11:19 AM
Hi All ,
I have a simple mapper which read some data from a log file and do some join operation with a another file data and send that combined output to reducer for further processing.
In mapper I have used DistributedCache as the file is small one. Its working properly.
Now I have to write some MRUnit test cases for that mapper. Can any one help me out with some code example how to write MRUnit with DistributedCache support.
I am using Hadoop2 and MRUnit version is as follows ....
<dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit</artifactId> <version>1.1.0</version> <classifier>hadoop2</classifier> </dependency>
In Driver class I have added for DistributedCache (this is just to explain how I added cache in MR)Job job = Job.getInstance(conf);job.setJarByClass(ReportDriver.class);
job.setJobName("Report"); job.addCacheFile(new Path("zone.txt").toUri()); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(ReportMapper.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setReducerClass(ReportReducer.class); job.setNumReduceTasks(3); //job.setCombinerClass(ReportReducer.class); logger.info("map job started ---------------"); System.exit(job.waitForCompletion(true) ? 0 : 1);
In Mapper class I am fetching the cases file like this ....
@Override protected void setup(Context context) throws IOException, InterruptedException { URI[] localPaths = context.getCacheFiles(); }
Please help me out if any one use DistributedCache with MRUnit with some code example...
Thanks a lot ....
Created 08-11-2016 12:52 PM
Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...
Created 08-11-2016 12:52 PM
Hi Biswajit, are you aware that MRUnit is pretty much dead project? There is no more development done on it. that said, I found an example here http://stackoverflow.com/questions/15674229/mapreduce-unit-test-fails-to-mock-distributedcache-getlo...
Created 08-11-2016 01:47 PM
Hi , Thanks for you reply....
ohhh.... is it , I did not know that. Is there any other unit testing tool do you know for M/R job ?
I gone through the URL also , its helpful but I also experienced the same issue as reported in stackoverlow "Null Pointer" exception when mapper trying to get the path/URI from configuration.
Created 08-11-2016 06:05 PM
unfortunately there are none that I know of. We do have one engineer who's providing support for his unit-testing framework and it has mapreduce unit testing capabilities, though I am not sure if distributedcache testing is supported. https://github.com/sakserv/hadoop-mini-clusters
Created 08-11-2016 06:10 PM
@Biswajit Chakraborty I can't believe I didn't think of this earlier, take a look at https://apache.googlesource.com/mrunit/+/e43ef01dd1199a7eb0963edbf05258a8609bf0dc/src/test/java/org/... This is MRUnit's own DistributedCache test with examples how to set it up.
Created 08-11-2016 07:42 PM
yapee......Artem , Thanks a lot 🙂 ......
You really saved my day.... Thanks again .....
Created 08-11-2016 07:50 PM
Please consider publishing an article on this, others will find it useful as it's not an obvious find.