Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hadoop: build an HAR from local file not in HDFS

Hadoop: build an HAR from local file not in HDFS

Explorer

Hello,

 

I'm able to build an HAR (Hadoop Archive) from files stored in HDFS:

hadoop archive -archiveName myArchive.har -p /mydata/filesToArchiveTogether * /user/myself/mydata/myarchives

This only works for sources files already stored in HDFS... 

 

Is it possible for Hadoop to create an HAR using files that are not stored in HDFS but on Linux file system?

 

Thanks for your comments,

Greg.

2 REPLIES 2
Highlighted

Re: Hadoop: build an HAR from local file not in HDFS

Master Guru
I haven't tried it, but if you force the MR execution to be local, you should be able to pass file:/// paths as input instead.

Try this out perhaps?:

hadoop archive -Dfs.defaultFS=file:/// -Dmapreduce.framework.name=local -archiveName myArchive.har -p /mydata/filesToArchiveTogether * /user/myself/mydata/myarchives

Re: Hadoop: build an HAR from local file not in HDFS

Explorer

Hello,

 

Thanks for your answer and your suggestion, I like the idea to force the execution to be local :)

 

Unfortunately, it doesn't work... I tried to use these arguments and I get following error stack:

Exception in thread "main" java.lang.StackOverflowError
        at java.net.URI.access$300(URI.java:464)
        at java.net.URI$Parser.scan(URI.java:2996)
        at java.net.URI$Parser.checkChars(URI.java:3019)
        at java.net.URI$Parser.parseHierarchical(URI.java:3105)
        at java.net.URI$Parser.parse(URI.java:3063)
        at java.net.URI.<init>(URI.java:588)
        at java.net.URI.create(URI.java:850)
        at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:178)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
...

I get same error:

  • if I ran this command from Linux shell on any machine with Hadoop binaries
  • if I ran this command from Hadoop NameNode machine
  • if I specify to build the archive on Linux filesystem
  • if I specify to build the archive on HDFS