Created on 10-04-2014 11:44 PM - edited 09-16-2022 02:09 AM
Im trying to run a simple map/reduce job on spoon 5.1 against a centos 6 cdh 5 cluster. My map jobs are failing with the following error.
I assume it is memory based, I just wondered whether anyone else had encountered this error ?
org.apache.commons.vfs.FileNotFoundException: Could not read from
"file:///yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0002/container_1412471201309_0002_01_000002/job.jar"
because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
Created 10-06-2014 11:12 PM
In case any kettle / spoon users look this error up,
it was caused by an incorrect data type being set on a map reduce mapper output value.
Was not clear from the hadoop based error message !!
Created 10-06-2014 11:12 PM
In case any kettle / spoon users look this error up,
it was caused by an incorrect data type being set on a map reduce mapper output value.
Was not clear from the hadoop based error message !!
Created 01-11-2015 04:57 PM
Hello,
I have cdh52 on centos and am using PDI 5.2(kettle) on one of the nodes. I'm following the example job that Pentaho provided at http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Parse+Weblog+Data.
I used the command below to see the log detail since the job failed without clear error message:
And I see that I'm having the same error message as you did below:
org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar" because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultFileContent.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(URL.java:1037)
at org.scannotation.archiveiterator.IteratorFactory.create(IteratorFactory.java:34)
at org.scannotation.AnnotationDB.scanArchives(AnnotationDB.java:291)
at org.pentaho.di.core.plugins.JarFileCache.getAnnotationDB(JarFileCache.java:58)
at org.pentaho.di.core.plugins.BasePluginType.findAnnotatedClassFiles(BasePluginType.java:258)
at org.pentaho.di.core.plugins.BasePluginType.registerPluginJars(BasePluginType.java:555)
at org.pentaho.di.core.plugins.BasePluginType.searchPlugins(BasePluginType.java:119)
at org.pentaho.di.core.plugins.PluginRegistry.registerType(PluginRegistry.java:570)
at org.pentaho.di.core.plugins.PluginRegistry.init(PluginRegistry.java:525)
at org.pentaho.di.core.KettleClientEnvironment.init(KettleClientEnvironment.java:96)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:91)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:69)
at org.pentaho.hadoop.mapreduce.MRUtil.initKettleEnvironment(MRUtil.java:107)
at org.pentaho.hadoop.mapreduce.MRUtil.getTrans(MRUtil.java:66)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.createTrans(PentahoMapRunnable.java:221)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.configure(PentahoMapRunnable.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.FileNotFoundException: /yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.commons.vfs.provider.local.LocalFile.doGetInputStream(Unknown Source)
... 33 more
Would you please share a little more detail on what you did in Kettle to make it run successfully? Were you referring to the "Mapper Input Step Name" of the "Pentaho Map Reduce" job? If then, what did you put for the field?
Thank you,
Daniel
Created 01-11-2015 09:40 PM
As my comment above says, this error was caused for me by specifying the wrong data types for the key and value in the map reduce job. i.e. in my case the key needed to be string and the value needed to be integer.