- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ClassNotFoundException when running a MapReduce job
- Labels:
-
Apache Hadoop
-
Apache Oozie
-
MapReduce
Created on ‎03-21-2014 09:11 AM - edited ‎09-16-2022 01:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm getting a ClassNotFoundException when running a MapReduce job using Oozie.
I'm using CDH4.2.1
The command I am using to start my job is:
oozie job -oozie http://localhost:11000/oozie -config job.properties -DstartDateTime=`date +%FT%RZ`
I have multiple jars which I am adding to the classpath using:
DistributedCache.addArchiveToClassPath()
However, these jars do no appear to be on the classpath of my MapReduce job.
I have two workarounds
1. Using the 'hadoop jar -libjars' command.
2. If I package all the jars as a jar-with-dependencies. I suspect that this works because the jar-with-dependencies is getting added to the classpath by job.setJarByClass(). This would imply that there is a problem with the DistributedCache.
Does anyone have any ideas how I can get this working through Oozie with multiple jars?
Thanks,
Andrew
Created ‎04-01-2014 02:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi just to follow up on this, I have now solved the problem.
There were two things that I needed to do:
1. In addition to adding oozie.libpath to my job.properties, I also needed to include oozie.use.system.libpath=true
2. Before I was using the following line to add files to the DistributedCache:
FileStatus[] status = fs.listStatus("/application/lib");
if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }
This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.
Now I am using the following to remove that and only add the absolute hdfs path:
FileStatus[] status = fs.listStatus("/application/lib"); if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }
Thankyou to those that replied to my original query for pointing me in the right direction.
Andrew
Created ‎03-21-2014 09:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Andrew,
Where are you storing the jars that you need in the distributed cache? Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?
Thanks
Chris
Created ‎03-21-2014 09:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, you're not trying to access HBase from the MR job, are you? Which class are you getting the CNF exception on?
Created ‎03-21-2014 09:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thankyou for the responses.
Where are you storing the jars that you need in the distributed cache?
The jars are stored on hdfs under "/application/lib"
Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?
I'm using ${oozie.coord.application.path} in my job.properties file.
If I use ${oozie.wf.application.path} instead then I get a CNF error because it can't find my ToolRunner class.
Should I be using ${oozie.wf.application.path} but adding my ToolRunner class to the HADOOP_CLASSPATH?
Also, you're not trying to access HBase from the MR job, are you?
I'm not using HBase. The MapReduce job ingests into Accumulo.
Which class are you getting the CNF exception on?
The CNF exception is on a class within one of my jar files. It's the super class of my Mapper.
Thanks,
Andrew
Created ‎03-21-2014 11:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Andrew,
Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?
Thanks
Chris
Created ‎03-25-2014 08:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?
I added the "oozie.libpath" property to my job.properties files. (In the format "hdfs://fqdn:8020/application/lib")
The MapReduce job ran successfully.
However, I had previously added local copies of my jars to the HADOOP_CLASSPATH using the "MapReduce Service Environment Safety Valve" property in Cloudera Manager (ie HADOOP_CLASSPATH=/localpath/foo.jar:/localpath/bar.jar)
Upon removing this my MapReduce job failed with the same CNF error as before.
Any ideas why I can only get this working with the jars on hdfs and the local file systems?
Surely I should only have the jars on hdfs?
Thanks for yor help,
Andrew
Created ‎04-01-2014 02:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi just to follow up on this, I have now solved the problem.
There were two things that I needed to do:
1. In addition to adding oozie.libpath to my job.properties, I also needed to include oozie.use.system.libpath=true
2. Before I was using the following line to add files to the DistributedCache:
FileStatus[] status = fs.listStatus("/application/lib");
if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }
This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.
Now I am using the following to remove that and only add the absolute hdfs path:
FileStatus[] status = fs.listStatus("/application/lib"); if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }
Thankyou to those that replied to my original query for pointing me in the right direction.
Andrew
