- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hadoop error while processing file with brackets.
Created on ‎10-11-2013 10:37 AM - edited ‎09-16-2022 01:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I have a lot of different files *.doc, *.pdf and so on. I wanted to process them with mapReduce.
I put them in HDFS and then started java MapReduce program using Hue.
If files are well formated and doesn't have brackets "(){}[]" in their name all goes fine.
But if there is a file OPN_last_[age.PDF
I get this errors:
Failing Oozie Launcher, Main class [distr.fors.ru.Index], main() threw exception, Illegal file pattern: Unclosed character class near index 17 OPN_last_[age.PDF ^ java.io.IOException: Illegal file pattern: Unclosed character class near index 17 OPN_last_[age.PDF ^ at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:70) at org.apache.hadoop.fs.GlobFilter.<init>(GlobFilter.java:49) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1670) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1627) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:211) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at distr.fors.ru.Index.run(Index.java:78) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at distr.fors.ru.Index.main(Index.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 17 OPN_last_[age.PDF ^ at org.apache.hadoop.fs.GlobPattern.error(GlobPattern.java:167) at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:151) at org.apache.hadoop.fs.GlobPattern.<init>(GlobPattern.java:42) at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:66) ... 32 more
If there is a file like this: {2011-01-27} (3769330).pdf
I get such error:
Input Pattern hdfs://fd-bigdata.distr.fors.ru:8020/{2011-01-27} (3769330).pdf matches 0 files at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) at distr.fors.ru.Index.run(Index.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at distr.fors.ru.Index.main(Index.java:37) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:495) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262)
I realy need to process such files. What can I make to solve such problems?
P.S. I am using the latest CDH 4.4.0.
Created ‎10-14-2013 11:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I don't have a specific answer for you, but I'd guess that because this is using regex patterns to match the filenames (or something similar), the bracket characters are "special" characters and you need to escape these characters. This is typically done by putting a "\" in front of them: e.g. "[" becomes "\[".
Created ‎10-14-2013 11:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I don't have a specific answer for you, but I'd guess that because this is using regex patterns to match the filenames (or something similar), the bracket characters are "special" characters and you need to escape these characters. This is typically done by putting a "\" in front of them: e.g. "[" becomes "\[".
Created ‎10-15-2013 11:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, this solved my problem.
