- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Downloading a file inside a hadoop archive using Apache Knox
- Labels:
-
Apache Hadoop
-
Apache Knox
Created ‎06-21-2016 06:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am able to download a hdfs file /org/project/archived/data/hive/warehouse/Stats/2016_06_20.txt in a broswer thorugh knox using below URL
Now I have a file in a hadoop archive as below.
har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt
How can i do the same for the above file?
Created ‎06-21-2016 07:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access
http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...
Check this nicely written blog for a full example and a PHP script which can automate the process.
Created ‎06-21-2016 06:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pooja khandelwal: You may use following approach to get hadoop archived file in your local machine.
hadoop fs -text har:///org/project/archived/data/hive/warehouse/test.har/Stats/2016_06_20.txt > 2016_06_20.txt
Created ‎06-21-2016 07:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to download the file through browser only.
Created ‎06-21-2016 07:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First download the har's _index file located at /org/project/archived/data/hive/warehouse/test.har/_index. Then locate Stats/2016_06_20.txt in _index and its data-n file, the offset within the data file and its length. Suppose it's in data-0 and offset=125000 and file-length=8200, then you can access
http://hostname:8443/knox/nm1/webhdfs/v1/org/project/archived/data/hive/warehouse/test.har/data-0?op...
Check this nicely written blog for a full example and a PHP script which can automate the process.
Created ‎06-22-2016 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
