Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

set compression library path for mapreduce

Highlighted

set compression library path for mapreduce

New Contributor

Hi all,

I am trying to use my own compression library for Mapred in Hadoop, and already set

mapreduce.map.output.compress = true,

mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.DefaultCodec

in mapred-site.xml

But I want to use my own zlib ( implemented more efficiently ).

I already set LD_LIBRARY_PATH to my_libz, and hadoop does see it by "hadoop checknative"

However, Mapreduce does not seem to follow LD_LIBRARY_PATH, it still uses Linux system's libz.so.1. Of course if I "ln -s -f my_libz.so.1 libz.so.1", Mapred has to use my_libz. This is, however, not I intend, as I want only Hadoop to use my_libz.so.1, not other appplications. If I remove system's libz.so, following errors are generated:

	Diagnostics: Exception from container-launch.
	Container id: container_1547233003817_0045_02_000001
	Exit code: 127
	Stack trace: ExitCodeException exitCode=127:
	   at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
	   at org.apache.hadoop.util.Shell.run(Shell.java:482)
	   at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
	   at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 127 Failing this attempt. Failing the application. 19/01/11 11:49:59 INFO mapreduce.Job: Counters: 0

I can, of course, recompile hadoop to build libhadoop.so to point to my_libz.so.1, but again this is not a solution as there are too many different Linux environments in my hadoop cluster and too troublesome to build and distribute libhadoop.so.

So a preferable solution would be to change some environment variable of Shell and/or Hadoop, through configuration files, without recompilation.

Any ideas or suggestions to accomplish this goal?

Many thanks in advance for any help.