Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to make all the mappers to access the same data through out the input file?

How to make all the mappers to access the same data through out the input file?

Explorer

My question seems to be unclear. I will explain with an example.

 

My task is to append the first row of my file to all the remaining rows(millions of rows) in that file.

 

Example:

 

I have an input file of 200 MB as below :

 

abc123

zsd     456     def      123123     adadfdf 

jdf     342     dsf       234234    asdfasdf

tkj      745     lkh       413531    kljkdfga

 

Output  Should be as :

 

zsd     456     def       123123     adadfdf     abc123

jdf      342     dsf       234234     sdfasdf      abc123

tkj      745     lkh       435345     kljkdfga     abc123

 

What I did was, stored the first row in a member variable of map class and then appended to all the remaining rows.

But the problem is, as the 200 MB file is splitted in 4 blocks (3*64 MB and 1*8 MB) the ouptut from the first mapper resulted correctly,

but the output of other mappers are appending the first row of their respective blocks.

 

How could I complete this task, I need to append the same "abc123" to all the rows of my input file. 

 

PS: My code contains only mapper and no reducer.

 

 

 

 

2 REPLIES 2

Re: How to make all the mappers to access the same data through out the input file?

You need to use a smarter algo with 1 input file and the string to input
and apply it to all the rows (e.g. in Pig) or have only one big map but you
would lose all the parallelism
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Highlighted

Re: How to make all the mappers to access the same data through out the input file?

Explorer

hi romain

Thankyou, your suggestion might solve the current problem.

I have similar issue where in there are 4 headers in my file with corresponding records. These headers have to be appended to their respective records.

Now how can I achieve that ? Any suggestions ?