Support Questions

mikejf · ‎10-04-2014

Im trying to run a simple map/reduce job on spoon 5.1 against a centos 6 cdh 5 cluster. My map jobs are failing with the following error.
I assume it is memory based, I just wondered whether anyone else had encountered this error ?

org.apache.commons.vfs.FileNotFoundException: Could not read from
"file:///yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0002/container_1412471201309_0002_01_000002/job.jar"
because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)

mikejf · ‎10-06-2014

In case any kettle / spoon users look this error up,

it was caused by an incorrect data type being set on a map reduce mapper output value.

Was not clear from the hadoop based error message !!

View solution in original post

mikejf · ‎10-06-2014

In case any kettle / spoon users look this error up,

it was caused by an incorrect data type being set on a map reduce mapper output value.

Was not clear from the hadoop based error message !!

interlee · ‎01-11-2015

Hello,

I have cdh52 on centos and am using PDI 5.2(kettle) on one of the nodes. I'm following the example job that Pentaho provided at http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Parse+Weblog+Data.

I used the command below to see the log detail since the job failed without clear error message:

[daniel@n1 hadoop-yarn]$ yarn logs -applicationId application_1420841940959_0005

And I see that I'm having the same error message as you did below:

org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar" because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultFileContent.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(URL.java:1037)
at org.scannotation.archiveiterator.IteratorFactory.create(IteratorFactory.java:34)
at org.scannotation.AnnotationDB.scanArchives(AnnotationDB.java:291)
at org.pentaho.di.core.plugins.JarFileCache.getAnnotationDB(JarFileCache.java:58)
at org.pentaho.di.core.plugins.BasePluginType.findAnnotatedClassFiles(BasePluginType.java:258)
at org.pentaho.di.core.plugins.BasePluginType.registerPluginJars(BasePluginType.java:555)
at org.pentaho.di.core.plugins.BasePluginType.searchPlugins(BasePluginType.java:119)
at org.pentaho.di.core.plugins.PluginRegistry.registerType(PluginRegistry.java:570)
at org.pentaho.di.core.plugins.PluginRegistry.init(PluginRegistry.java:525)
at org.pentaho.di.core.KettleClientEnvironment.init(KettleClientEnvironment.java:96)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:91)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:69)
at org.pentaho.hadoop.mapreduce.MRUtil.initKettleEnvironment(MRUtil.java:107)
at org.pentaho.hadoop.mapreduce.MRUtil.getTrans(MRUtil.java:66)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.createTrans(PentahoMapRunnable.java:221)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.configure(PentahoMapRunnable.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.FileNotFoundException: /yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.commons.vfs.provider.local.LocalFile.doGetInputStream(Unknown Source)
... 33 more

Would you please share a little more detail on what you did in Kettle to make it run successfully? Were you referring to the "Mapper Input Step Name" of the "Pentaho Map Reduce" job? If then, what did you put for the field?

Thank you,

Daniel

mikejf · ‎01-11-2015

As my comment above says, this error was caused for me by specifying the wrong data types for the key and value in the map reduce job. i.e. in my case the key needed to be string and the value needed to be integer.

Cloudera Community

Support Questions

org.apache.commons.vfs.FileNotFoundException spoon map reduce failure on cdh5 centos