Support Questions

Find answers, ask questions, and share your expertise

NoClassDefFoundError thrown when using TypedBytesWritable for SequenceFile key and value

avatar
Contributor

I changed an existing (and working) project from using Text to using TypeBytesWritable for the key and value of a sequence file becuase I plan to use that file to stream to Python.  This generates the most annoying:

 

14/01/30 18:11:32 INFO hadoop.ImportFiles: gov.msic.hadoop.ImportFiles started.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/typedbytes/TypedBytesWritable
    at gov.msic.hadoop.ImportFiles.main(ImportFiles.java:304)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.typedbytes.TypedBytesWritable
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 6 more
Import failed!

I am using CM 4.6 with CDH 4.5.0-1.cdh4.5.0.p0.30.  I can see the required jar at

 

/opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop-mapreduce/hadoop-streaming.jar

 

so I should figure this is in the classpath?  Or it there some inherent violation of a law of the universe when using TypeBytes in a sequence file?

1 ACCEPTED SOLUTION

avatar
Contributor

Hi Bogolese,

 

Try to modify "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS and don't forget to deploy configuration on client nodes from cloudera manager web UI. When you fire deploy client configuration command configuration will get loaded on each node and your job will get latest configuration on each node without any issue.

 

 

Regards,
Chirag Patadia.

View solution in original post

4 REPLIES 4

avatar
Mentor
Hi,

If you use MR1, then the /usr/lib/hadoop-mapreduce will not be on your
classpath, but /usr/lib/hadoop-0.20-mapreduce would.

For MR1, these set of classes reside under
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/*.jar, which is
typically not on the default classpath. Can you try the below,
perhaps?

~> export HADOOP_CLASSPATH="/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/*"
~> hadoop jar your-jar...

avatar
Contributor

Harsh,

 

I used the "bogus" workaround to get this going by including hadoop-streaming-2.0.0-cdh4.5.0.jar in the /lib directory of my job jar.  Bogus, obviously, because it is version dependent and the jar is already distributed out to the cluster.  I would prefer not to use the distributed cache for the same reason.

 

So, I am (finally) returning to your suggestion and have tried it without success.  I still get the NoClassDefFoundError on org.apache.hadoop.typedbytes.TypedBytesWritable.

 

Also, doesn't HADOOP_CLASSPATH only apply to the node on which the job is launched?  Or is my understanding flawed?

 

I would *prefer* to make whatever changes are necessary to the MR configuration via CM to make the streaming jar available to all MR jobs.  To that end I modified "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS without success.  I am open to your sage advice -- preferrably setting the classpath in a "please use current version" way.

avatar
Contributor

Hi Bogolese,

 

Try to modify "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS and don't forget to deploy configuration on client nodes from cloudera manager web UI. When you fire deploy client configuration command configuration will get loaded on each node and your job will get latest configuration on each node without any issue.

 

 

Regards,
Chirag Patadia.

avatar
Contributor

Thank you.  Shame on me for not "finishing the job" as it were.