Reply
Contributor
Posts: 25
Registered: ‎10-11-2013
Accepted Solution

Hadoop error while processing file with brackets.

[ Edited ]

Hello. I have a lot of different files *.doc, *.pdf and so on. I wanted to process them with mapReduce. 

I put them in HDFS and then started java MapReduce program using Hue.

 

If files are well formated and doesn't have brackets "(){}[]" in their name all goes fine.

But if there is a file OPN_last_[age.PDF

I get this errors:

 

Failing Oozie Launcher, Main class [distr.fors.ru.Index], main() threw exception, Illegal file pattern: Unclosed character class near index 17
OPN_last_[age.PDF
^
java.io.IOException: Illegal file pattern: Unclosed character class near index 17
OPN_last_[age.PDF
^
at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:70)
at org.apache.hadoop.fs.GlobFilter.<init>(GlobFilter.java:49)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1670)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1627)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:211)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at distr.fors.ru.Index.run(Index.java:78)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at distr.fors.ru.Index.main(Index.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 17
OPN_last_[age.PDF
^
at org.apache.hadoop.fs.GlobPattern.error(GlobPattern.java:167)
at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:151)
at org.apache.hadoop.fs.GlobPattern.<init>(GlobPattern.java:42)
at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:66)
... 32 more

If there is a file like this: {2011-01-27} (3769330).pdf

I get such error:

Input Pattern hdfs://fd-bigdata.distr.fors.ru:8020/{2011-01-27} (3769330).pdf matches 0 files at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at distr.fors.ru.Index.run(Index.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at distr.fors.ru.Index.main(Index.java:37) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)

I realy need to process such files. What can I make to solve such problems?

 

P.S. I am using the latest CDH 4.4.0. 

 

 

 

 

 

Highlighted
Cloudera Employee
Posts: 35
Registered: ‎07-08-2013

Re: Hadoop error while processing file with brackers.

Hi,

 

I don't have a specific answer for you, but I'd guess that because this is using regex patterns to match the filenames (or something similar), the bracket characters are "special" characters and you need to escape these characters.  This is typically done by putting a "\" in front of them: e.g. "[" becomes "\[".  

Software Engineer | Cloudera, Inc. | http://cloudera.com
Contributor
Posts: 25
Registered: ‎10-11-2013

Re: Hadoop error while processing file with brackers.

Thanks, this solved my problem. 

Announcements