Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

hdfs file actual block paths

avatar
New Member

Is there a way to use the HDFS API to get a list of blocks and the data nodes that store a particular HDFS file?

If that's not possible, at a minimum, is there a way to determine which data nodes store a particular HDFS file?

1 ACCEPTED SOLUTION

avatar

Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.

$ hdfs fsck /myfile.txt -files -blocks -locations
Connecting to namenode via http://localhost:50070
FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016
/myfile.txt 875664 bytes, 1 block(s):  OK
0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]

This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.

Could you please explain your use case?

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

You could try:

curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"
    

The client receives a response with a FileStatus JSON object:

HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked

{
  "FileStatus":
  {
    "accessTime"      : 0,
    "blockSize"       : 0,
    "group"           : "supergroup",
    "length"          : 0,             //in bytes, zero for directories
    "modificationTime": 1320173277227,
    "owner"           : "webuser",
    "pathSuffix"      : "",
    "permission"      : "777",
    "replication"     : 0,
    "type"            : "DIRECTORY"    //enum {FILE, DIRECTORY}
  }
}
    

avatar

Hi @Leon L, the easiest way to do so from the command line, if you are an administrator, is run the 'fsck' command with the -files -blocks -locations options. e.g.

$ hdfs fsck /myfile.txt -files -blocks -locations
Connecting to namenode via http://localhost:50070
FSCK started by someuser (auth:SIMPLE) from /127.0.0.1 for path /myfile.txt at Sun Jul 10 17:55:32 PDT 2016
/myfile.txt 875664 bytes, 1 block(s):  OK
0. BP-810817926-127.0.0.1-1468198364624:blk_1073741825_1001 len=875664 repl=1 [127.0.0.1:50010]

This will return a list of blocks along with which DataNodes that have the replicas of each block. This is a one off solution if you need to get the block locations for a small number of files. There is no publicly available API to query the block locations for a file that I know of.

Could you please explain your use case?

avatar
Visitor

Any solution using HDFS API ????????