Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Question about map output

avatar
Contributor

I'm looking for clarification on something I've read in Hadoop: The Definitive Guide.  It states that map output is local and not in HDFS.  But the map task runs on the node where the data resides (usually) and that is HDFS, correct?  Or is it the case that the I/O done by the map is standard Java I/O and not something like hadoop fs -put?

 

Thanks in advance to all who answer.

Thanks in advance to all who reply.

Kevin
1 ACCEPTED SOLUTION

avatar
The map task's local output is not stored within HDFS, rather in temporary
directories on that specific node (see property mapreduce.cluster.local.dir)
written using standard file I/O

https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-de...

Regards,
Gautam Gopalakrishnan

View solution in original post

1 REPLY 1

avatar
The map task's local output is not stored within HDFS, rather in temporary
directories on that specific node (see property mapreduce.cluster.local.dir)
written using standard file I/O

https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-de...

Regards,
Gautam Gopalakrishnan