hi recommend to use Hadoop HAR Usage: hadoop archive -archiveName name -p * -archiveName is the name of the archive you would like to create. An example would be foo.har. The name should have a *.har extension. The parent argument is to specify the relative path to which the files should be archived to. Example would be : -p /foo/bar a/b/c e/f/g Here /foo/bar is the parent path and a/b/c, e/f/g are relative paths to parent. Note that this is a Map/Reduce job that creates the archives. You would need a map reduce cluster to run this. For a detailed example the later sections. If you just want to archive a single directory /foo/bar then you can just use hadoop archive -archiveName zoo.har -p /foo/bar /outputdir