05-14-2015 02:09 PM
I think this will be an old problem , except for newbies like me.
I have seen some pages which talk about this both on stackoverflow and here http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
I am using a jar file in my map function, and the jar is accompanied by a .so file.
When I run with just java (no hadoop) it works.
java -Djava.library.path=/folder/containing/sharedobjects/ -classpath /externaljars/some.jar:.:/tmp/mypack/ mypack.MyTest
When I try with hadoop it says ClassNotFoundException .
hadoop jar myjar.jar mypack.MyClass -libjars /externaljars/some.jar -Djava.library.path=/folder/containing/sharedobjects/
Error: java.lang.ClassNotFoundException: Someclass
but Someclass is present in some.jar
in my run method I have included this, although it is deprecated:
DistributedCache.addCacheFile(new URI("/folder/containing/sharedobjects/extlib.so#nickname.so"), conf);
I can use "hdfs://host:port/libraries/mylib.so.1#mylib.so", but not sure if I should use localhost:50070.
In my map method I have
If someone can reply back with the latest correct usage, it will be very useful. Thank you.
05-15-2015 05:07 AM
I have also copied the third party shared object file , into these folders
I have already tried various methods suggested by various sources, but still looking for a definitive answer which works.
05-16-2015 06:14 AM
I found a workaround, I think.
basically my tool runner was throwing a warning saying cannot read command line options.
I changed the usage slightly , as per https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver...
no longer warning, and I think because I copied the shared objects into lib/native , it sees the shared objects
I think it should also work with command line options now, without having to copy to lib/native