04-30-2015 06:15 PM
I will need to build a dynamic reference (lookup) from a particular batch of files and use that reference in the processing the same set of files is this possible?
Can multiple threads write simultaniously into lookup file in Hadoop. I want to process 20 files and run them parallely and they have to update one file simultaniously. is this possible.
05-13-2015 07:47 PM
I am not completely clear on what you want to do but the whole notion of writing in one thread and reading in another is wrong in MapReduce. A mapper will run in its own JVM and the mapper can run on any node in the cluster. They are standalone processes and there is no way you can communicate between them. You might want to look at leveraging HBase for writing the information and reading it back again in other mappers.
Then in the end if you use one reducer you can get all the mapper output and reduce it to one file.