Support Questions

Find answers, ask questions, and share your expertise

Downloading a file inside a hadoop archive using Apache Knox

I am able to download a hdfs file /org/project/archived/data/hive/warehouse/Stats/2016_06_20.txt in a broswer thorugh knox using below URL

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/Stats/2016_06_20.t...

Now I have a file in a hadoop archive as below.

har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt

How can i do the same for the above file?

1 ACCEPTED SOLUTION

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

View solution in original post

4 REPLIES 4

Guru

@pooja khandelwal: You may use following approach to get hadoop archived file in your local machine.

hadoop fs -text har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt > 2016_06_20.txt

I want to download the file through browser only.

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

Thank you.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.