Support Questions

pooja_khandelwa · ‎06-21-2016

I am able to download a hdfs file /org/project/archived/data/hive/warehouse/Stats/2016_06_20.txt in a broswer thorugh knox using below URL

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/Stats/2016_06_20.t...

Now I have a file in a hadoop archive as below.

har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt

How can i do the same for the above file?

pminovic · ‎06-21-2016

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

View solution in original post

SK1 · ‎06-21-2016

@pooja khandelwal: You may use following approach to get hadoop archived file in your local machine.

hadoop fs -text har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt > 2016_06_20.txt

pooja_khandelwa · ‎06-21-2016

I want to download the file through browser only.

pminovic · ‎06-21-2016

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

pooja_khandelwa · ‎06-22-2016

Thank you.

Cloudera Community

Support Questions

Downloading a file inside a hadoop archive using Apache Knox