Support Questions

Find answers, ask questions, and share your expertise

ClassNotFoundException when running a MapReduce job

avatar
Explorer

Hi,

 

I'm getting a ClassNotFoundException when running a MapReduce job using Oozie.
I'm using CDH4.2.1

 

The command I am using to start my job is:

 

oozie job -oozie http://localhost:11000/oozie -config job.properties -DstartDateTime=`date +%FT%RZ`

I have multiple jars which I am adding to the classpath using:

DistributedCache.addArchiveToClassPath()

However, these jars do no appear to be on the classpath of my MapReduce job.

 

I have two workarounds
1. Using the 'hadoop jar -libjars' command.
2. If I package all the jars as a jar-with-dependencies. I suspect that this works because the jar-with-dependencies is getting added to the classpath by job.setJarByClass(). This would imply that there is a problem with the DistributedCache.

 

Does anyone have any ideas how I can get this working through Oozie with multiple jars?

 

Thanks,
Andrew

1 ACCEPTED SOLUTION

avatar
Explorer

Hi just to follow up on this, I have now solved the problem.

 

There were two things that I needed to do:

 

1. In addition to adding oozie.libpath to my job.properties,  I also needed to include oozie.use.system.libpath=true

 

2. Before I was using the following line to add files to the DistributedCache:

    

FileStatus[] status = fs.listStatus("/application/lib");

if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }

 

This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.

 

Now I am using the following to remove that and only add the absolute hdfs path:

FileStatus[] status = fs.listStatus("/application/lib");

if (status != null) {
    for (int i = 0; i < status.length; ++i) {
        if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }

 

 

 

Thankyou to those that replied to my original query for pointing me in the right direction.

 

Andrew

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

Hey Andrew,

 

Where are you storing the jars that you need in the distributed cache?  Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?

 

Thanks

Chris

avatar
Guru

Also, you're not trying to access HBase from the MR job, are you?  Which class are you getting the CNF exception on?

avatar
Explorer

Thankyou for the responses.

 

Where are you storing the jars that you need in the distributed cache? 

The jars are stored on hdfs under "/application/lib"

 

Are they in the "${oozie.wf.application.path}/lib" or another location in HDFS?

I'm using ${oozie.coord.application.path} in my job.properties file.

If I use ${oozie.wf.application.path} instead then I get a CNF error because it can't find my ToolRunner class.

 

Should I be using ${oozie.wf.application.path} but adding my ToolRunner class to the HADOOP_CLASSPATH?

 

Also, you're not trying to access HBase from the MR job, are you?  

I'm not using HBase.  The MapReduce job ingests into Accumulo.

 

Which class are you getting the CNF exception on? 

The CNF exception is on a class within one of my jar files.  It's the super class of my Mapper.

 

Thanks,

Andrew

 

 

 

avatar
Super Collaborator

Hey Andrew,

 

Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?

 

Thanks

Chris

avatar
Explorer

Hi,

 

Can you try adding "oozie.libpath=/application/lib" to your job.properties and see if that helps?

I added the "oozie.libpath" property to my job.properties files.  (In the format "hdfs://fqdn:8020/application/lib")

The MapReduce job ran successfully.

 

However, I had previously added local copies of my jars to the HADOOP_CLASSPATH using the "MapReduce Service Environment Safety Valve" property in Cloudera Manager (ie HADOOP_CLASSPATH=/localpath/foo.jar:/localpath/bar.jar)  

Upon removing this my MapReduce job failed with the same CNF error as before.

 

Any ideas why I can only get this working with the jars on hdfs and the local file systems?

Surely I should only have the jars on hdfs?

 

Thanks for yor help,

Andrew

 

 

avatar
Explorer

Hi just to follow up on this, I have now solved the problem.

 

There were two things that I needed to do:

 

1. In addition to adding oozie.libpath to my job.properties,  I also needed to include oozie.use.system.libpath=true

 

2. Before I was using the following line to add files to the DistributedCache:

    

FileStatus[] status = fs.listStatus("/application/lib");

if (status != null) { for (int i = 0; i < status.length; ++i) { if (!status[i].isDir()) { DistributedCache.addFileToClassPath(status[i].getPath(), job.getConfiguration(), fs); } } }

 

This appeared to be causing a classpath issue because it was adding hdfs://hostname before the hdfs path.

 

Now I am using the following to remove that and only add the absolute hdfs path:

FileStatus[] status = fs.listStatus("/application/lib");

if (status != null) {
    for (int i = 0; i < status.length; ++i) {
        if (!status[i].isDir()) {
Path distCachePath = new Path(status[i].getPath().toUri().getPath()); DistributedCache.addFileToClassPath(distCachePath, job.getConfiguration(), fs); } } }

 

 

 

Thankyou to those that replied to my original query for pointing me in the right direction.

 

Andrew