Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hdfs file actual block paths

avatar
New Contributor

Is there a way to use the HDFS API to get a list of blocks and the data nodes that store a particular HDFS file?

If that's not possible, at a minimum, is there a way to determine which data nodes store a particular HDFS file?

1 ACCEPTED SOLUTION

avatar

Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.

$ hdfs fsck /myfile.txt -files -blocks -locations
Connecting to namenode via http://localhost:50070
FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016
/myfile.txt 875664 bytes, 1 block(s):  OK
0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]

This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.

Could you please explain your use case?

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

You could try:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"
    

The client receives a response with a FileStatus JSON object:

HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked

{
  "FileStatus":
  {
    "accessTime"      : 0,
    "blockSize"       : 0,
    "group"           : "supergroup",
    "length"          : 0,             //in bytes, zero for directories
    "modificationTime": 1320173277227,
    "owner"           : "webuser",
    "pathSuffix"      : "",
    "permission"      : "777",
    "replication"     : 0,
    "type"            : "DIRECTORY"    //enum {FILE, DIRECTORY}
  }
}
    

avatar

Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.

$ hdfs fsck /myfile.txt -files -blocks -locations
Connecting to namenode via http://localhost:50070
FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016
/myfile.txt 875664 bytes, 1 block(s):  OK
0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]

This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.

Could you please explain your use case?

avatar
Explorer

Any solution using HDFS API ????????