Support Questions

pooja_khandelwa · ‎06-21-2016

I am able to download a hdfs file /org/project/archived/data/hive/warehouse/Stats/2016_06_20.txt in a broswer thorugh knox using below URL

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/Stats/2016_06_20.t...

Now I have a file in a hadoop archive as below.

har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt

How can i do the same for the above file?

pminovic · ‎06-21-2016

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

View solution in original post

SK1 · ‎06-21-2016

@pooja khandelwal: You may use following approach to get hadoop archived file in your local machine.

hadoop fs -text har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt > 2016_06_20.txt

pooja_khandelwa · ‎06-21-2016

I want to download the file through browser only.

pminovic · ‎06-21-2016

First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access

http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...

Check this nicely written blog for a full example and a PHP script which can automate the process.

pooja_khandelwa · ‎06-22-2016

Thank you.

Cloudera Community

Support Questions

Downloading a file inside a hadoop archive using Apache Knox

Monitoring Apache Knox

Knox HA / Loadbalancing using Haproxy

Configure Ambari server to archive log files

Apache Knox Ambari Cluster Monitoring

Compiling Apache Tez with Apache Hadoop 2.8.0 or l...

Mirroring Datasets Between Hadoop Clusters with Ap...

Unable to write to container default due to archiv...

SQL Client Example using KnoxShell in Apache Knox

Download Client Configs

Creating HTML from PDF, Excel and Word Documents u...