- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
org.apache.commons.vfs.FileNotFoundException spoon map reduce failure on cdh5 centos
Created on ‎10-04-2014 11:44 PM - edited ‎09-16-2022 02:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Im trying to run a simple map/reduce job on spoon 5.1 against a centos 6 cdh 5 cluster. My map jobs are failing with the following error.
I assume it is memory based, I just wondered whether anyone else had encountered this error ?
org.apache.commons.vfs.FileNotFoundException: Could not read from
"file:///yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0002/container_1412471201309_0002_01_000002/job.jar"
because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
Created ‎10-06-2014 11:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case any kettle / spoon users look this error up,
it was caused by an incorrect data type being set on a map reduce mapper output value.
Was not clear from the hadoop based error message !!
Created ‎10-06-2014 11:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case any kettle / spoon users look this error up,
it was caused by an incorrect data type being set on a map reduce mapper output value.
Was not clear from the hadoop based error message !!
Created ‎01-11-2015 04:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have cdh52 on centos and am using PDI 5.2(kettle) on one of the nodes. I'm following the example job that Pentaho provided at http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Parse+Weblog+Data.
I used the command below to see the log detail since the job failed without clear error message:
- [daniel@n1 hadoop-yarn]$ yarn logs -applicationId application_1420841940959_0005
And I see that I'm having the same error message as you did below:
org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar" because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultFileContent.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(URL.java:1037)
at org.scannotation.archiveiterator.IteratorFactory.create(IteratorFactory.java:34)
at org.scannotation.AnnotationDB.scanArchives(AnnotationDB.java:291)
at org.pentaho.di.core.plugins.JarFileCache.getAnnotationDB(JarFileCache.java:58)
at org.pentaho.di.core.plugins.BasePluginType.findAnnotatedClassFiles(BasePluginType.java:258)
at org.pentaho.di.core.plugins.BasePluginType.registerPluginJars(BasePluginType.java:555)
at org.pentaho.di.core.plugins.BasePluginType.searchPlugins(BasePluginType.java:119)
at org.pentaho.di.core.plugins.PluginRegistry.registerType(PluginRegistry.java:570)
at org.pentaho.di.core.plugins.PluginRegistry.init(PluginRegistry.java:525)
at org.pentaho.di.core.KettleClientEnvironment.init(KettleClientEnvironment.java:96)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:91)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:69)
at org.pentaho.hadoop.mapreduce.MRUtil.initKettleEnvironment(MRUtil.java:107)
at org.pentaho.hadoop.mapreduce.MRUtil.getTrans(MRUtil.java:66)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.createTrans(PentahoMapRunnable.java:221)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.configure(PentahoMapRunnable.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.FileNotFoundException: /yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.commons.vfs.provider.local.LocalFile.doGetInputStream(Unknown Source)
... 33 more
Would you please share a little more detail on what you did in Kettle to make it run successfully? Were you referring to the "Mapper Input Step Name" of the "Pentaho Map Reduce" job? If then, what did you put for the field?
Thank you,
Daniel
Created ‎01-11-2015 09:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As my comment above says, this error was caused for me by specifying the wrong data types for the key and value in the map reduce job. i.e. in my case the key needed to be string and the value needed to be integer.
