Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to overwrite an existing output file/dir during execution of MapReduce jobs?
Labels:
- Labels:
-
Apache Hadoop
Explorer
Created 10-06-2017 09:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 REPLIES 2
Explorer
Created 10-06-2017 10:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can use delete(outputpath,true) method which takes two arguments
Example
Path path = new Path(outputDirectory);
path.getFileSystem(conf).delete(path,true);
Explorer
Created 10-07-2017 10:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below two steps to delete the output directory(not recommended) in MapReduce:
1) using shell:bin/hadoop dfs -rmr /path/to/your/output/
2) JAVA API:
// configuration should contain reference to your namenode
FileSystem fs = FileSystem.get(new Configuration());
// true stands for recursively deleting the folder you gave
fs.delete(new Path(”/path/to/your/output”), true);
If you want to override the existing:
Need to overwrite the Hadoop OutputFormat class:
public class OverwriteOutputDirOutputFile extends TextOutputFormat{
@Override
public void checkOutputSpecs(FileSystem ignored, JobConf job)
throws FileAlreadyExistsException,
InvalidJobConfException, IOException {
// Ensure that the output directory is set and not already there
Path outDir = getOutputPath(job);
if (outDir == null && job.getNumReduceTasks() != 0) {
throw new InvalidJobConfException(”Output directory not set in JobConf.”);
}
if (outDir != null) {
FileSystem fs = outDir.getFileSystem(job);
// normalize the output directory
outDir = fs.makeQualified(outDir);
setOutputPath(job, outDir);
// get delegation token for the outDir’s file system
TokenCache.obtainTokensForNamenodes(job.getCredentials(),
new Path[] {outDir}, job);
// check its existence
/* if (fs.exists(outDir)) {
throw new FileAlreadyExistsException(”Output directory ” + outDir +
” already exists”);
}*/
}
}
}
