- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How multiple reducer writing the output ? can multiple reducer write the output in single output file?
- Labels:
-
Apache Hadoop
Created ‎02-22-2017 01:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file?
Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers.
2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?
if yes then how?
How container is assigning for map and reduce task?
Created ‎02-22-2017 03:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please find the details below:
1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers.
->By default each reducer will generate a separate output file like part-0000 and this output will be stored in HDFS. if we want to merge all the reducers output to single file, then explicitly we have write our own code using MultipleOutputs or using hadoop -fs getmerge command
2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?if yes then how? How container is assigning for map and reduce task?
->Yes map and reduce task run in the same container but not parallel. In mapredce V1 we have fixed mappers and reducer slots and we have to run map tasks in mapper slots and reducer tasks in reducer slots only. But in Mapreduce V2 we have option to run either map/reduce tasks in either mapper/reducer container.
-> We can't run more than 1 map/reduce tasks at a time in the same container
Created ‎02-22-2017 01:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
suppose if we want to run 1000 map task then we need 1000 container or we can run the map task less than 1000?
Created ‎02-22-2017 03:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please find the details below:
1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers.
->By default each reducer will generate a separate output file like part-0000 and this output will be stored in HDFS. if we want to merge all the reducers output to single file, then explicitly we have write our own code using MultipleOutputs or using hadoop -fs getmerge command
2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?if yes then how? How container is assigning for map and reduce task?
->Yes map and reduce task run in the same container but not parallel. In mapredce V1 we have fixed mappers and reducer slots and we have to run map tasks in mapper slots and reducer tasks in reducer slots only. But in Mapreduce V2 we have option to run either map/reduce tasks in either mapper/reducer container.
-> We can't run more than 1 map/reduce tasks at a time in the same container
