Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How would you download (copy) a directory with WebHDFS API?

Solved Go to solution

How would you download (copy) a directory with WebHDFS API?

I'm looking at https://hadoop.apache.org/docs/current/hadoop-proj... and I don't find an easy way to copy one folder.

Do I have to get the list of the content of the folder and download one by one?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How would you download (copy) a directory with WebHDFS API?

Downloading an entire directory would be a recursive operation that walks the entire sub-tree, downloading each file it encounters in that sub-tree. The WebHDFS REST API alone doesn't implement any such recursive operations. (The recursive=true option for DELETE is a different case, because it's just telling the NameNode to prune the whole sub-tree. There isn't any need to traverse the sub-tree and return results to the caller along the way.) Recursion is something that would have to be implemented on the client side by listing the contents of a directory, and then handling the children returned for that directory.

Depending on what you need to do, it might be sufficient to use the "hdfs dfs -copyToLocal" CLI command using a path with the "webhdfs" URI scheme and a wildcard. Here is an example:

> hdfs dfs -ls webhdfs://localhost:50070/file*
-rw-r--r--   3 chris supergroup          6 2015-12-15 10:13 webhdfs://localhost:50070/file1
-rw-r--r--   3 chris supergroup          6 2015-12-15 10:13 webhdfs://localhost:50070/file2

> hdfs dfs -copyToLocal webhdfs://localhost:50070/file*

> ls -lrt file*
-rw-r--r--+ 1 chris  staff     6B Dec 16 10:23 file2
-rw-r--r--+ 1 chris  staff     6B Dec 16 10:23 file1

In this example, the "hdfs dfs -copyToLocal" command made a WebHDFS HTTP call to the NameNode to list the contents of "/". It then filtered the returned results by the glob pattern "file*". Based on those filtered results, it then sent a series of additional HTTP calls to the NameNode and DataNodes to get the contents of file1 and file2 and write them locally.

This isn't a recursive solution though. Wildcard glob matching is only sufficient for matching a static pattern and walking to a specific depth in the tree. It can't fully discover and walk the whole sub-tree. That would require custom application code.

5 REPLIES 5

Re: How would you download (copy) a directory with WebHDFS API?

Mentor

add "recursive" switch

curl -i -X DELETE "http://<host>:<port>/webhdfs/v1/<path>?op=DELETE
                              [&recursive=<true |false>]"

Re: How would you download (copy) a directory with WebHDFS API?

@Artem Ervits Looks like he is asking for way to copy the contents of whole directory rather than deleting it.

Re: How would you download (copy) a directory with WebHDFS API?

Mentor

I'm aware of that, this was the only available example. Just add a recursive switch=true to the command you want to execute.

Re: How would you download (copy) a directory with WebHDFS API?

Downloading an entire directory would be a recursive operation that walks the entire sub-tree, downloading each file it encounters in that sub-tree. The WebHDFS REST API alone doesn't implement any such recursive operations. (The recursive=true option for DELETE is a different case, because it's just telling the NameNode to prune the whole sub-tree. There isn't any need to traverse the sub-tree and return results to the caller along the way.) Recursion is something that would have to be implemented on the client side by listing the contents of a directory, and then handling the children returned for that directory.

Depending on what you need to do, it might be sufficient to use the "hdfs dfs -copyToLocal" CLI command using a path with the "webhdfs" URI scheme and a wildcard. Here is an example:

> hdfs dfs -ls webhdfs://localhost:50070/file*
-rw-r--r--   3 chris supergroup          6 2015-12-15 10:13 webhdfs://localhost:50070/file1
-rw-r--r--   3 chris supergroup          6 2015-12-15 10:13 webhdfs://localhost:50070/file2

> hdfs dfs -copyToLocal webhdfs://localhost:50070/file*

> ls -lrt file*
-rw-r--r--+ 1 chris  staff     6B Dec 16 10:23 file2
-rw-r--r--+ 1 chris  staff     6B Dec 16 10:23 file1

In this example, the "hdfs dfs -copyToLocal" command made a WebHDFS HTTP call to the NameNode to list the contents of "/". It then filtered the returned results by the glob pattern "file*". Based on those filtered results, it then sent a series of additional HTTP calls to the NameNode and DataNodes to get the contents of file1 and file2 and write them locally.

This isn't a recursive solution though. Wildcard glob matching is only sufficient for matching a static pattern and walking to a specific depth in the tree. It can't fully discover and walk the whole sub-tree. That would require custom application code.

Re: How would you download (copy) a directory with WebHDFS API?

Thank you! I will play with it.

Don't have an account?
Coming from Hortonworks? Activate your account here