Created on 03-21-2014 09:11 AM - edited 09-16-2022 01:55 AM
Hi,
I'm getting a ClassNotFoundException when running a MapReduce job using Oozie.
I'm using CDH4.2.1
The command I am using to start my job is:
oozie job -oozie http://localhost:11000/oozie -config job.properties -DstartDateTime=`date +%FT%RZ`
I have multiple jars which I am adding to the classpath using:
DistributedCache.addArchiveToClassPath()
However, these jars do no appear to be on the classpath of my MapReduce job.
I have two workarounds
1. Using the 'hadoop jar -libjars' command.
2. If I package all the jars as a jar-with-dependencies. I suspect that this works because the jar-with-dependencies is getting added to the classpath by job.setJarByClass(). This would imply that there is a problem with the DistributedCache.
Does anyone have any ideas how I can get this working through Oozie with multiple jars?
Thanks,
Andrew
Created 04-01-2014 02:59 AM
Hi just to follow up on this, I have now solved the problem.
There were two things that I needed to do:
1. In addition to adding oozie.libpath to my job.properties, I also needed to include oozie.use.system.libpath=true
2. Before I was using the following line to add files to the DistributedCache:
FileStatus[] status = fs.listStatus("/application/lib");
if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }
This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.
Now I am using the following to remove that and only add the absolute hdfs path:
FileStatus[] status = fs.listStatus("/application/lib"); if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }
Thankyou to those that replied to my original query for pointing me in the right direction.
Andrew
Created 03-21-2014 09:21 AM
Hey Andrew,
Where are you storing the jars that you need in the distributed cache? Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?
Thanks
Chris
Created 03-21-2014 09:23 AM
Also, you're not trying to access HBase from the MR job, are you? Which class are you getting the CNF exception on?
Created 03-21-2014 09:58 AM
Thankyou for the responses.
Where are you storing the jars that you need in the distributed cache?
The jars are stored on hdfs under "/application/lib"
Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?
I'm using ${oozie.coord.application.path} in my job.properties file.
If I use ${oozie.wf.application.path} instead then I get a CNF error because it can't find my ToolRunner class.
Should I be using ${oozie.wf.application.path} but adding my ToolRunner class to the HADOOP_CLASSPATH?
Also, you're not trying to access HBase from the MR job, are you?
I'm not using HBase. The MapReduce job ingests into Accumulo.
Which class are you getting the CNF exception on?
The CNF exception is on a class within one of my jar files. It's the super class of my Mapper.
Thanks,
Andrew
Created 03-21-2014 11:41 AM
Hey Andrew,
Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?
Thanks
Chris
Created 03-25-2014 08:07 AM
Hi,
Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?
I added the "oozie.libpath" property to my job.properties files. (In the format "hdfs://fqdn:8020/application/lib")
The MapReduce job ran successfully.
However, I had previously added local copies of my jars to the HADOOP_CLASSPATH using the "MapReduce Service Environment Safety Valve" property in Cloudera Manager (ie HADOOP_CLASSPATH=/localpath/foo.jar:/localpath/bar.jar)
Upon removing this my MapReduce job failed with the same CNF error as before.
Any ideas why I can only get this working with the jars on hdfs and the local file systems?
Surely I should only have the jars on hdfs?
Thanks for yor help,
Andrew
Created 04-01-2014 02:59 AM
Hi just to follow up on this, I have now solved the problem.
There were two things that I needed to do:
1. In addition to adding oozie.libpath to my job.properties, I also needed to include oozie.use.system.libpath=true
2. Before I was using the following line to add files to the DistributedCache:
FileStatus[] status = fs.listStatus("/application/lib");
if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }
This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.
Now I am using the following to remove that and only add the absolute hdfs path:
FileStatus[] status = fs.listStatus("/application/lib"); if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }
Thankyou to those that replied to my original query for pointing me in the right direction.
Andrew