Created 12-16-2015 10:01 AM
I'm looking at https://hadoop.apache.org/docs/current/hadoop-proj... and I don't find an easy way to copy one folder.
Do I have to get the list of the content of the folder and download one by one?
Created 12-16-2015 06:32 PM
Downloading an entire directory would be a recursive operation that walks the entire sub-tree, downloading each file it encounters in that sub-tree. The WebHDFS REST API alone doesn't implement any such recursive operations. (The recursive=true option for DELETE is a different case, because it's just telling the NameNode to prune the whole sub-tree. There isn't any need to traverse the sub-tree and return results to the caller along the way.) Recursion is something that would have to be implemented on the client side by listing the contents of a directory, and then handling the children returned for that directory.
Depending on what you need to do, it might be sufficient to use the "hdfs dfs -copyToLocal" CLI command using a path with the "webhdfs" URI scheme and a wildcard. Here is an example:
> hdfs dfs -ls webhdfs://localhost:50070/file* -rw-r--r-- 3 chris supergroup 6 2015-12-15 10:13 webhdfs://localhost:50070/file1 -rw-r--r-- 3 chris supergroup 6 2015-12-15 10:13 webhdfs://localhost:50070/file2 > hdfs dfs -copyToLocal webhdfs://localhost:50070/file* > ls -lrt file* -rw-r--r--+ 1 chris staff 6B Dec 16 10:23 file2 -rw-r--r--+ 1 chris staff 6B Dec 16 10:23 file1
In this example, the "hdfs dfs -copyToLocal" command made a WebHDFS HTTP call to the NameNode to list the contents of "/". It then filtered the returned results by the glob pattern "file*". Based on those filtered results, it then sent a series of additional HTTP calls to the NameNode and DataNodes to get the contents of file1 and file2 and write them locally.
This isn't a recursive solution though. Wildcard glob matching is only sufficient for matching a static pattern and walking to a specific depth in the tree. It can't fully discover and walk the whole sub-tree. That would require custom application code.
Created 12-16-2015 01:25 PM
add "recursive" switch
curl -i -X DELETE "http://<host>:<port>/webhdfs/v1/<path>?op=DELETE [&recursive=<true |false>]"
Created 12-16-2015 01:33 PM
@Artem Ervits Looks like he is asking for way to copy the contents of whole directory rather than deleting it.
Created 12-16-2015 01:42 PM
I'm aware of that, this was the only available example. Just add a recursive switch=true to the command you want to execute.
Created 12-16-2015 06:32 PM
Downloading an entire directory would be a recursive operation that walks the entire sub-tree, downloading each file it encounters in that sub-tree. The WebHDFS REST API alone doesn't implement any such recursive operations. (The recursive=true option for DELETE is a different case, because it's just telling the NameNode to prune the whole sub-tree. There isn't any need to traverse the sub-tree and return results to the caller along the way.) Recursion is something that would have to be implemented on the client side by listing the contents of a directory, and then handling the children returned for that directory.
Depending on what you need to do, it might be sufficient to use the "hdfs dfs -copyToLocal" CLI command using a path with the "webhdfs" URI scheme and a wildcard. Here is an example:
> hdfs dfs -ls webhdfs://localhost:50070/file* -rw-r--r-- 3 chris supergroup 6 2015-12-15 10:13 webhdfs://localhost:50070/file1 -rw-r--r-- 3 chris supergroup 6 2015-12-15 10:13 webhdfs://localhost:50070/file2 > hdfs dfs -copyToLocal webhdfs://localhost:50070/file* > ls -lrt file* -rw-r--r--+ 1 chris staff 6B Dec 16 10:23 file2 -rw-r--r--+ 1 chris staff 6B Dec 16 10:23 file1
In this example, the "hdfs dfs -copyToLocal" command made a WebHDFS HTTP call to the NameNode to list the contents of "/". It then filtered the returned results by the glob pattern "file*". Based on those filtered results, it then sent a series of additional HTTP calls to the NameNode and DataNodes to get the contents of file1 and file2 and write them locally.
This isn't a recursive solution though. Wildcard glob matching is only sufficient for matching a static pattern and walking to a specific depth in the tree. It can't fully discover and walk the whole sub-tree. That would require custom application code.
Created 12-16-2015 10:21 PM
Thank you! I will play with it.