I am trying to use my own compression library for Mapred in Hadoop, and already set
mapreduce.map.output.compress = true,
mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.DefaultCodec
But I want to use my own zlib ( implemented more efficiently ).
I already set LD_LIBRARY_PATH to my_libz, and hadoop does see it by "hadoop checknative"
However, Mapreduce does not seem to follow LD_LIBRARY_PATH, it still uses Linux system's libz.so.1. Of course if I "ln -s -f my_libz.so.1 libz.so.1", Mapred has to use my_libz. This is, however, not I intend, as I want only Hadoop to use my_libz.so.1, not other appplications. If I remove system's libz.so, following errors are generated:
Diagnostics: Exception from container-launch. Container id: container_1547233003817_0045_02_000001 Exit code: 127 Stack trace: ExitCodeException exitCode=127: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 127 Failing this attempt. Failing the application. 19/01/11 11:49:59 INFO mapreduce.Job: Counters: 0
I can, of course, recompile hadoop to build libhadoop.so to point to my_libz.so.1, but again this is not a solution as there are too many different Linux environments in my hadoop cluster and too troublesome to build and distribute libhadoop.so.
So a preferable solution would be to change some environment variable of Shell and/or Hadoop, through configuration files, without recompilation.
Any ideas or suggestions to accomplish this goal?
Many thanks in advance for any help.