Support Questions
Find answers, ask questions, and share your expertise

set compression library path for mapreduce

set compression library path for mapreduce

New Contributor

Hi all,

I am trying to use my own compression library for Mapred in Hadoop, and already set

mapreduce.map.output.compress = true,

mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.DefaultCodec

in mapred-site.xml

But I want to use my own zlib ( implemented more efficiently ).

I already set LD_LIBRARY_PATH to my_libz, and hadoop does see it by "hadoop checknative"

However, Mapreduce does not seem to follow LD_LIBRARY_PATH, it still uses Linux system's libz.so.1. Of course if I "ln -s -f my_libz.so.1 libz.so.1", Mapred has to use my_libz. This is, however, not I intend, as I want only Hadoop to use my_libz.so.1, not other appplications. If I remove system's libz.so, following errors are generated:

	Diagnostics: Exception from container-launch.
	Container id: container_1547233003817_0045_02_000001
	Exit code: 127
	Stack trace: ExitCodeException exitCode=127:
	   at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
	   at org.apache.hadoop.util.Shell.run(Shell.java:482)
	   at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
	   at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 127 Failing this attempt. Failing the application. 19/01/11 11:49:59 INFO mapreduce.Job: Counters: 0

I can, of course, recompile hadoop to build libhadoop.so to point to my_libz.so.1, but again this is not a solution as there are too many different Linux environments in my hadoop cluster and too troublesome to build and distribute libhadoop.so.

So a preferable solution would be to change some environment variable of Shell and/or Hadoop, through configuration files, without recompilation.

Any ideas or suggestions to accomplish this goal?

Many thanks in advance for any help.

Don't have an account?