Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

org.apache.commons.vfs.FileNotFoundException spoon map reduce failure on cdh5 centos

avatar
New Contributor

Im trying to run a simple map/reduce job on spoon 5.1 against a centos 6 cdh 5 cluster. My map jobs are failing with the following error.
I assume it is memory based, I just wondered whether anyone else had encountered this error ?


org.apache.commons.vfs.FileNotFoundException: Could not read from
"file:///yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0002/container_1412471201309_0002_01_000002/job.jar"
because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)

1 ACCEPTED SOLUTION

avatar
New Contributor

 

In  case any kettle / spoon users look this error up,

 

it was caused by an incorrect data type being set on a map reduce mapper output value.

Was not clear from the hadoop based error message !!

View solution in original post

3 REPLIES 3

avatar
New Contributor

 

In  case any kettle / spoon users look this error up,

 

it was caused by an incorrect data type being set on a map reduce mapper output value.

Was not clear from the hadoop based error message !!

avatar
New Contributor

Hello,

 

I have cdh52 on centos and am using PDI 5.2(kettle) on one of the nodes. I'm following the example job that Pentaho provided at http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Parse+Weblog+Data.

 

I used the command below to see the log detail since the job failed without clear error message:

 

  • [daniel@n1 hadoop-yarn]$ yarn logs -applicationId application_1420841940959_0005 

 

And I see that I'm having the same error message as you did below:

 

org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar" because it is a not a file.
at org.apache.commons.vfs.provider.AbstractFileObject.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultFileContent.getInputStream(Unknown Source)
at org.apache.commons.vfs.provider.DefaultURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(URL.java:1037)
at org.scannotation.archiveiterator.IteratorFactory.create(IteratorFactory.java:34)
at org.scannotation.AnnotationDB.scanArchives(AnnotationDB.java:291)
at org.pentaho.di.core.plugins.JarFileCache.getAnnotationDB(JarFileCache.java:58)
at org.pentaho.di.core.plugins.BasePluginType.findAnnotatedClassFiles(BasePluginType.java:258)
at org.pentaho.di.core.plugins.BasePluginType.registerPluginJars(BasePluginType.java:555)
at org.pentaho.di.core.plugins.BasePluginType.searchPlugins(BasePluginType.java:119)
at org.pentaho.di.core.plugins.PluginRegistry.registerType(PluginRegistry.java:570)
at org.pentaho.di.core.plugins.PluginRegistry.init(PluginRegistry.java:525)
at org.pentaho.di.core.KettleClientEnvironment.init(KettleClientEnvironment.java:96)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:91)
at org.pentaho.di.core.KettleEnvironment.init(KettleEnvironment.java:69)
at org.pentaho.hadoop.mapreduce.MRUtil.initKettleEnvironment(MRUtil.java:107)
at org.pentaho.hadoop.mapreduce.MRUtil.getTrans(MRUtil.java:66)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.createTrans(PentahoMapRunnable.java:221)
at org.pentaho.hadoop.mapreduce.PentahoMapRunnable.configure(PentahoMapRunnable.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.FileNotFoundException: /yarn/nm/usercache/daniel/appcache/application_1420841940959_0005/container_1420841940959_0005_01_000002/job.jar (Is a directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.commons.vfs.provider.local.LocalFile.doGetInputStream(Unknown Source)
... 33 more

 

 

Would you please share a little more detail on what you did in Kettle to make it run successfully? Were you referring to the "Mapper Input Step Name" of the "Pentaho Map Reduce" job? If then, what did you put for the field?

 

Thank you,

Daniel

 

avatar
New Contributor

 

As my comment above says, this error was caused for me by specifying the wrong data types for the key and value in the map reduce job. i.e. in  my case the key needed to be string and the value needed to be integer.