Created on 01-30-2014 04:02 PM - edited 09-16-2022 01:53 AM
I changed an existing (and working) project from using Text to using TypeBytesWritable for the key and value of a sequence file becuase I plan to use that file to stream to Python. This generates the most annoying:
14/01/30 18:11:32 INFO hadoop.ImportFiles: gov.msic.hadoop.ImportFiles started.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/typedbytes/TypedBytesWritable
at gov.msic.hadoop.ImportFiles.main(ImportFiles.java:304)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.typedbytes.TypedBytesWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 6 more
Import failed!
I am using CM 4.6 with CDH 4.5.0-1.cdh4.5.0.p0.30. I can see the required jar at
/opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/hadoop-mapreduce/hadoop-streaming.jar
so I should figure this is in the classpath? Or it there some inherent violation of a law of the universe when using TypeBytes in a sequence file?
Created 02-14-2014 02:27 AM
Hi Bogolese,
Try to modify "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS and don't forget to deploy configuration on client nodes from cloudera manager web UI. When you fire deploy client configuration command configuration will get loaded on each node and your job will get latest configuration on each node without any issue.
Created 01-30-2014 05:43 PM
Created 02-13-2014 10:33 AM
Harsh,
I used the "bogus" workaround to get this going by including hadoop-streaming-2.0.0-cdh4.5.0.jar in the /lib directory of my job jar. Bogus, obviously, because it is version dependent and the jar is already distributed out to the cluster. I would prefer not to use the distributed cache for the same reason.
So, I am (finally) returning to your suggestion and have tried it without success. I still get the NoClassDefFoundError on org.apache.hadoop.typedbytes.TypedBytesWritable.
Also, doesn't HADOOP_CLASSPATH only apply to the node on which the job is launched? Or is my understanding flawed?
I would *prefer* to make whatever changes are necessary to the MR configuration via CM to make the streaming jar available to all MR jobs. To that end I modified "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS without success. I am open to your sage advice -- preferrably setting the classpath in a "please use current version" way.
Created 02-14-2014 02:27 AM
Hi Bogolese,
Try to modify "MapReduce Client Environment Safety Valve for hadoop-env.sh" with values for HADOOP_CLASSPATH and HADOOP_TASKTRACKER_OPTS and don't forget to deploy configuration on client nodes from cloudera manager web UI. When you fire deploy client configuration command configuration will get loaded on each node and your job will get latest configuration on each node without any issue.
Created 02-14-2014 07:37 AM
Thank you. Shame on me for not "finishing the job" as it were.