Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Downloading a file inside a hadoop archive using Apache Knox

avatar

I am able to download a hdfs file /org/project/archived/data/hive/warehouse/Stats/2016_06_20.txt in a broswer thorugh knox using below URL

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/Stats/2016_06_20.t...

Now I have a file in a hadoop archive as below.

har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt

How can i do the same for the above file?

1 ACCEPTED SOLUTION

avatar
Master Guru

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

View solution in original post

4 REPLIES 4

avatar
Guru

@pooja khandelwal: You may use following approach to get hadoop archived file in your local machine.

hadoop fs -text har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt > 2016_06_20.txt

avatar

I want to download the file through browser only.

avatar
Master Guru

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

avatar

Thank you.