Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Build dynamic reference and consume in same batch

Build dynamic reference and consume in same batch

New Contributor

I will need to build a dynamic reference (lookup) from a particular batch of files and use that reference in the processing the same set of files is this possible?


Can multiple threads write simultaniously into lookup file in Hadoop. I want to process 20 files and run them parallely and they have to update one file simultaniously. is this possible.




Re: Build dynamic reference and consume in same batch

Super Collaborator

I am not completely clear on what you want to do but the whole notion of writing in one thread and reading in another is wrong in MapReduce. A mapper will run in its own JVM and the mapper can run on any node in the cluster. They are standalone processes and there is no way you can communicate between them. You might want to look at leveraging HBase for writing the information and reading it back again in other mappers.


Then in the end if you use one reducer you can get all the mapper output and reduce it to one file.