Reply
Grg
Explorer
Posts: 24
Registered: ‎08-14-2015

Hadoop: build an HAR from local file not in HDFS

Hello,

 

I'm able to build an HAR (Hadoop Archive) from files stored in HDFS:

hadoop archive -archiveName myArchive.har -p /mydata/filesToArchiveTogether * /user/myself/mydata/myarchives

This only works for sources files already stored in HDFS... 

 

Is it possible for Hadoop to create an HAR using files that are not stored in HDFS but on Linux file system?

 

Thanks for your comments,

Greg.

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Hadoop: build an HAR from local file not in HDFS

I haven't tried it, but if you force the MR execution to be local, you should be able to pass file:/// paths as input instead.

Try this out perhaps?:

hadoop archive -Dfs.defaultFS=file:/// -Dmapreduce.framework.name=local -archiveName myArchive.har -p /mydata/filesToArchiveTogether * /user/myself/mydata/myarchives
Highlighted
Grg
Explorer
Posts: 24
Registered: ‎08-14-2015

Re: Hadoop: build an HAR from local file not in HDFS

[ Edited ]

Hello,

 

Thanks for your answer and your suggestion, I like the idea to force the execution to be local :)

 

Unfortunately, it doesn't work... I tried to use these arguments and I get following error stack:

Exception in thread "main" java.lang.StackOverflowError
        at java.net.URI.access$300(URI.java:464)
        at java.net.URI$Parser.scan(URI.java:2996)
        at java.net.URI$Parser.checkChars(URI.java:3019)
        at java.net.URI$Parser.parseHierarchical(URI.java:3105)
        at java.net.URI$Parser.parse(URI.java:3063)
        at java.net.URI.<init>(URI.java:588)
        at java.net.URI.create(URI.java:850)
        at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:178)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
...

I get same error:

  • if I ran this command from Linux shell on any machine with Hadoop binaries
  • if I ran this command from Hadoop NameNode machine
  • if I specify to build the archive on Linux filesystem
  • if I specify to build the archive on HDFS
Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.