I am trying to use my own compression library for Mapred in Hadoop, and already set
mapreduce.map.output.compress = true,
mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.DefaultCodec
But I want to use my own zlib ( implemented more efficiently ).
I already set LD_LIBRARY_PATH to my_libz, and hadoop does see it by "hadoop checknative"
However, Mapreduce does not seem to follow LD_LIBRARY_PATH, it still uses Linux system's libz.so.1. Of course if I "ln -s -f my_libz.so.1 libz.so.1", Mapred has to use my_libz. This is, however, not I intend, as I want only Hadoop to use my_libz.so.1, not other appplications. If I remove system's libz.so, following errors are generated:
Diagnostics: Exception from container-launch.
Container id: container_1547233003817_0045_02_000001
Exit code: 127
Stack trace: ExitCodeException exitCode=127:
Container exited with a non-zero exit code 127
Failing this attempt. Failing the application.
19/01/11 11:49:59 INFO mapreduce.Job: Counters: 0
I can, of course, recompile hadoop to build libhadoop.so to point to my_libz.so.1, but again this is not a solution as there are too many different Linux environments in my hadoop cluster and too troublesome to build and distribute libhadoop.so.
So a preferable solution would be to change some environment variable of Shell and/or Hadoop, through configuration files, without recompilation.
Any ideas or suggestions to accomplish this goal?
Many thanks in advance for any help.
... View more
Thanks, it does work now if I store files into hdfs. However, if I do mapreduce, it seems hadoop is not following the native zlib path shown in the results from "hadoop checknative", i.e., it does NOT use my library but still the system's library. When of course, if I force hadoop use my library by " ln -s -f /home/smjohn/lib/my_libz.so.1.1 libz.so.1" then it does end up using my library But that is not what I want, as I just need hadoop to use my zlib, but not other applications. I know I can always recompile from hadoop source, but that is not a solution, as my cluster has quite a lot of nodes, and they do have different environments. So any suggestions of how to make hadoop/hive to use my zlib library for all mapreduce tasks? Thanks in advance for any help.
... View more
Hi, I am new and not sure if this is the right board to ask following questions. We are planning to use our own implementation of zlib for default compression in Hadoop ( for performance reason ), and our host OS is Linux ( say Ubuntu ). 1. Which default libraries does Hadoop use? On our OS, /lib/x86_64-linux-gnu/libz.so.1.2.8 is installed? Does this mean Hadoop uses this one as default DEFLATE library? 2. If so, which Hadoop configuration files specify parameters for the DEFLATE algorithm ( and libz.so.1.28 )? 3. How does Hadoop uses zlib, i.e., which source files contain lines related to using zlib, or which Hadoop source files define AIPs to call zlib? Thanks for any help.
... View more