I'm using a map-only hadoop task to transfer files from S3 into a local cluster. Along the way, I split the lines into their own directories based on record type using MultipleOutputs. When a map task dies due to S3 connection issues it leaves its MultipleOutput directories, making retries impossible.
Is there a way to avoid this? Can I ask a Map what file a named MultipleOutput will write to and delete them in the setup call?