- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
MR job Split locations null
- Labels:
-
MapReduce
Created 09-07-2018 08:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running a mapreduce job, calculating the split locations and I see job.split and job.splitmetainfo files contain the locations, but mapper side it is prining the locations null
CDH enterprise 5.14.0
Sep 7, 9:16:49.300 AM INFO org.apache.hadoop.mapred.MapTask
Processing split: AccInputSplit [splitId=174, locations=[null, null, null, null, null, null, null, null, null, null], splitLength=516537674]
anybody seen like this?
Created on 09-10-2018 07:12 PM - edited 09-10-2018 07:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> but getLocations() is fine, we are using fair scheduler, which should take getLocations() and schedule the job to according to data localituy right?
I'm not sure I entirely follow. Do you mean to say you return constant values as part of getLocations() in your implementation of AccInputSplit? If that is so, then yes I believe it should work. In that case, could you share masked snippets from your implementation of toString() and getLocations() for a closer look?
However, if your getLocations() is intended to be dynamic then you must absolutely and correctly implement the write/readFields serialization methods. This is because the resource requests are made by the Application Master after it reads and understands the splits file prepared by the client (client calls write to serialize the split objects, AM calls readFields to deserialize them). If your readFields is a dummy method, then the objects constructed in the AM runtime will not carry all the data you intended it to carry from the client end.
Created 09-07-2018 09:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the locations correctly when deserialising? I can only guess at what's
wrong for this specific situation without the relevant source bits, sorry.
The tasks all receive the same splits file you've inspected.
Does a local job runner test work fine?
Created 09-07-2018 09:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>> your custom split class' readFields method
I will investigate this, update this thread, thanks
Created 09-10-2018 08:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Harsh J you are right readFields is just a dummy,
but getLocations() is fine, we are using fair scheduler, which should take getLocations() and schedule the job to according to data localituy right?
Created on 09-10-2018 07:12 PM - edited 09-10-2018 07:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> but getLocations() is fine, we are using fair scheduler, which should take getLocations() and schedule the job to according to data localituy right?
I'm not sure I entirely follow. Do you mean to say you return constant values as part of getLocations() in your implementation of AccInputSplit? If that is so, then yes I believe it should work. In that case, could you share masked snippets from your implementation of toString() and getLocations() for a closer look?
However, if your getLocations() is intended to be dynamic then you must absolutely and correctly implement the write/readFields serialization methods. This is because the resource requests are made by the Application Master after it reads and understands the splits file prepared by the client (client calls write to serialize the split objects, AM calls readFields to deserialize them). If your readFields is a dummy method, then the objects constructed in the AM runtime will not carry all the data you intended it to carry from the client end.
Created 09-11-2018 07:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the help,
proper implmentation of readfields solved the problem.
