Created 07-09-2016 04:52 PM
Is there a way to use the HDFS API to get a list of blocks and the data nodes that store a particular HDFS file?
If that's not possible, at a minimum, is there a way to determine which data nodes store a particular HDFS file?
Created 07-11-2016 01:16 AM
Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.
$ hdfs fsck /myfile.txt -files -blocks -locations Connecting to namenode via http://localhost:50070 FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016 /myfile.txt 875664 bytes, 1 block(s): OK 0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]
This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.
Could you please explain your use case?
Created 07-09-2016 11:39 PM
You could try:
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"
The client receives a response with a FileStatus JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileStatus": { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, //in bytes, zero for directories "modificationTime": 1320173277227, "owner" : "webuser", "pathSuffix" : "", "permission" : "777", "replication" : 0, "type" : "DIRECTORY" //enum {FILE, DIRECTORY} } }
Created 07-11-2016 01:16 AM
Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.
$ hdfs fsck /myfile.txt -files -blocks -locations Connecting to namenode via http://localhost:50070 FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016 /myfile.txt 875664 bytes, 1 block(s): OK 0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]
This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.
Could you please explain your use case?
Created 02-10-2018 08:17 AM
Any solution using HDFS API ????????