New Contributor
Posts: 2
Registered: ‎02-10-2015

Build dynamic reference and consume in same batch

I will need to build a dynamic reference (lookup) from a particular batch of files and use that reference in the processing the same set of files is this possible?


Can multiple threads write simultaniously into lookup file in Hadoop. I want to process 20 files and run them parallely and they have to update one file simultaniously. is this possible.



Cloudera Employee
Posts: 314
Registered: ‎01-16-2014

Re: Build dynamic reference and consume in same batch

I am not completely clear on what you want to do but the whole notion of writing in one thread and reading in another is wrong in MapReduce. A mapper will run in its own JVM and the mapper can run on any node in the cluster. They are standalone processes and there is no way you can communicate between them. You might want to look at leveraging HBase for writing the information and reading it back again in other mappers.


Then in the end if you use one reducer you can get all the mapper output and reduce it to one file.