Support Questions

Find answers, ask questions, and share your expertise

Can't get rid of NoClassDefFoundError: org/apache/hadoop/hive/serde2/columnar/BytesRefArrayWritable

avatar
Contributor

I'm using  Java mapreduce job to write data to a directory which will be interpreted as a Hive table in RCFile format.

 

In order to do this, I need to include org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable object,

which can be found in hive-serde-0.13.1-cdh5.3.3.jar. So far, so good.

 

I've included the jar in my command line like this:

/usr/bin/hadoop jar /path/lib/awesome-mapred-0.9.6.jar com.awesome.HiveLoadController -libjars /path/lib/postgresql-8.4-702.jdbc4.jar,/path/lib/hive-serde-0.13.1-cdh5.3.3.jar

 

I know for certain that it is loading the postgres library because it prints correctly retrieved information before it throws the error.

I know that it is grabbing and transferring that jar file because it throws a fit if I move it from the /path/lib directory.

I know that the object exists in the jar because I've unpacked it and looked.

 

Is there something in the rest of the lib path that might be interfering with it finding that object in the jar?

 

1 ACCEPTED SOLUTION

avatar
Mentor

Thank you for the additional details!

 

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

This indicates a problem in the driver-end, or as you say 'during the execution of the job controller'.

 

The issue is that even if you do add something to the MR distributed cache classpath, your executor class also references the same class. The act of adding a jar to the distributed task's classpath does not also add it to the local one.

 

Here's how you can ensure that, if you use 'hadoop jar' to execute your job:

 

 

~> export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar
~> hadoop jar your-app.jar your.main.Class [arguments]

This will add it also to your local JVM classpath, while your code will further add it onto the remote execution classpaths.

 

> Optimally, I shouldn't have to stuff this one in the distributed cache since it sits in /opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/hive-exec-0.13.1-cdh5.3.5.jar on all of my slave nodes, but I also can't figure out how to tell MapReduce to look there.

 

MR remote execution classpath is governed by the classpath entry defined in the mapred-site.xml and yarn-site.xml, and the additonal elements you add to the DistributedCache. They do not use the entire /opt/cloudera/parcels/CDH/jars/* path - this is so for isolation and flexibility purposes, as that area may carry multiple versions of the same dependencies, etc.

 

Does this help?

View solution in original post

5 REPLIES 5

avatar
Mentor
What is the full stack trace? That'd be necessary to tell where the failure point lies.

If it fails at the driver/client end, you will likely also need to add the jar to HADOOP_CLASSPATH env-var before the command invocation.

If it fails at the MR task end, then you'll need to make sure your distributed-cache configs works (by checking job config xml to search your jar inside it)

avatar
Contributor

This error was called during the execution of the job controller within the MapReduce job. Here's a similar one with the same root problem.

 

 

 

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/orc/OrcNewOutputFormat
	at com.who.bgt.logloader.schema.OrcFileLoader.run(OrcFileLoader.java:94)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at com.who.bgt.logloader.schema.OrcFileLoader.main(OrcFileLoader.java:45)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 8 more

 

The specific line it is complaining about is here:

 

job.setOutputFormatClass(OrcNewOutputFormat.class);

 

The obvious problem is that it's failing to find the OrcNewOutputFormat class definition, which is in hive-exec-0.13.1-cdh5.3.5.jar

 

I pushed the jar to hdfs://lib/hive-exec..., and within my main function, I call the following before I run the job:

 

		DistributedCache.addFileToClassPath(new Path("/lib/hive-exec-0.13.1-cdh5.3.5.jar"), lConfig);

Can you be more explicit on how I go about making sure my distributed-cache configs work?

 

Optimally, I shouldn't have to stuff this one in the distributed cache since it sits in /opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/hive-exec-0.13.1-cdh5.3.5.jar on all of my slave nodes, but I also can't figure out how to tell MapReduce to look there.

 

avatar
Mentor

Thank you for the additional details!

 

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

This indicates a problem in the driver-end, or as you say 'during the execution of the job controller'.

 

The issue is that even if you do add something to the MR distributed cache classpath, your executor class also references the same class. The act of adding a jar to the distributed task's classpath does not also add it to the local one.

 

Here's how you can ensure that, if you use 'hadoop jar' to execute your job:

 

 

~> export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar
~> hadoop jar your-app.jar your.main.Class [arguments]

This will add it also to your local JVM classpath, while your code will further add it onto the remote execution classpaths.

 

> Optimally, I shouldn't have to stuff this one in the distributed cache since it sits in /opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/hive-exec-0.13.1-cdh5.3.5.jar on all of my slave nodes, but I also can't figure out how to tell MapReduce to look there.

 

MR remote execution classpath is governed by the classpath entry defined in the mapred-site.xml and yarn-site.xml, and the additonal elements you add to the DistributedCache. They do not use the entire /opt/cloudera/parcels/CDH/jars/* path - this is so for isolation and flexibility purposes, as that area may carry multiple versions of the same dependencies, etc.

 

Does this help?

avatar
Contributor

Hi,

 

I'm facing a similar issue with RCFileInputFormat.

 

Im executing a simple code to read from a RCFile in mapper (usind RCFileInputFormat) and doing an aggregation on reducer side.

 

A able to compile the code. But, while running facing ClassNotFoundException for Class org.apache.hadoop.hive.ql.io.RCFileInputFormat.

 

Tried adding the jar in hadoop classpath but no luck.

 

The below is the StackTrace. 

 

--> hadoop jar MRJobRCFile.jar MRJobRCFile /apps/hive/warehouse/7360_0609_rx/day=06-09-2017/hour=13/quarter=2/ /test_9

 

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.RCFileInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:620)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.ql.io.RCFileInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java

 

Should I investigate from jobconf.xml . If so, what do i need to check ?

 

 

 

avatar
Contributor

I was able to make the job run by adding hive-exec jar in HADOOP_CLASSPATH as well as adding the jar in distributed cache. 

 

Can you throw some light as to why do we need to export the jar to classpath and also add in distributed cache.