Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to Find Block Locations of a file in HDFS using JAVA API?

avatar
Explorer

Hello,

i want to get details "Block-Locations" of a particular file say abc.csv(500mb)(Cluster: 1NM and 3DNs).

when i -put a file it is divided into blocks of default size 64MB and spread across Hadoop Cluster.

By using Web interface "http://namenode:50070" we can find out block location across cluster.

Also by using command : hadoop fsck <file-pat> -files -blocks -locations

But what i am trying to achieve is to get these information through JAVA API or WEB- API.

Please let me know the solution if any.

Any help will be appretiated.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Yog Prabhhu, you can get the file block information from WebHDFS REST API like

curl -i "http://<HOST>:<PORT>/webhdfs/v1/<FilePath>?op=GETFILEBLOCKLOCATIONS

The corresponding JAVA API is FileSystem.getFileBlockLocations:

public BlockLocation[] getFileBlockLocations(FileStatus file,
    long start, long len)

You will get an array of block locations like below:

[BlockLocation(offset: 0, length: BLOCK_SIZE,*   hosts: {"host1:9866", "host2:9866, host3:9866"},...,]

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@Yog Prabhhu, you can get the file block information from WebHDFS REST API like

curl -i "http://<HOST>:<PORT>/webhdfs/v1/<FilePath>?op=GETFILEBLOCKLOCATIONS

The corresponding JAVA API is FileSystem.getFileBlockLocations:

public BlockLocation[] getFileBlockLocations(FileStatus file,
    long start, long len)

You will get an array of block locations like below:

[BlockLocation(offset: 0, length: BLOCK_SIZE,*   hosts: {"host1:9866", "host2:9866, host3:9866"},...,]

avatar
New Contributor

Hey, do you have an idea how I can actually locate and read a specific block using BlockLocation? I want to read the block byte by byte.(I understand it might be a remote read)

avatar
Community Manager

@husseljo, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: