Support Questions

Find answers, ask questions, and share your expertise

replacing zlib

New Contributor

Hi, I am new and not sure if this is the right board to ask following questions.

 

We are planning to use our own implementation of zlib for default compression in Hadoop ( for performance reason ), and our host OS is Linux ( say Ubuntu ).

 

1. Which default libraries does Hadoop use? On our OS, /lib/x86_64-linux-gnu/libz.so.1.2.8 is installed? Does this mean  Hadoop uses this one as default DEFLATE library? 

 

2. If so, which Hadoop configuration files specify parameters for the DEFLATE algorithm ( and libz.so.1.28 )?

 

3. How does Hadoop uses zlib, i.e., which source files contain lines related to using zlib, or which Hadoop source files define AIPs to call zlib?

 

Thanks for any help.

2 REPLIES 2

Rising Star

Hi,

You can use the command `hadoop checknative` to see which native libraries are being loaded. In the example below I check the native libraries and then change LD_LIBRARY_PATH and run the command again. In the second example you can see I'm loading libz from a different location.

 

[cloudera@quickstart ~]$ hadoop checknative
18/09/27 14:06:47 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/09/27 14:06:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4:     true revision:10301
bzip2:   true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
[cloudera@quickstart ~]$ cp /lib64/libz.so.1 lib/
[cloudera@quickstart ~]$ export LD_LIBRARY_PATH=$PWD/lib
[cloudera@quickstart ~]$ hadoop checknative
18/09/27 14:07:29 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/09/27 14:07:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:    true /home/cloudera/lib/libz.so.1
snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4:     true revision:10301
bzip2:   true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so

 

Hope this helps,

Jim

New Contributor

Thanks, it does work now if I store files into hdfs.

 

However, if I do mapreduce, it seems hadoop is not  following the native zlib path shown in

the results from "hadoop checknative", i.e., it does NOT use my library but still the system's library. 

When of course, if I force hadoop use my library by

" ln -s -f /home/smjohn/lib/my_libz.so.1.1 libz.so.1"

then it does end up using my library

 

But that is not what I want,  as I just need hadoop to use my zlib, but not other applications. I know I can always recompile from hadoop source, but  that is not a solution, as my cluster has quite a lot of nodes, and they do have different environments.

 

So any suggestions of how to make hadoop/hive to use my zlib library for all mapreduce tasks?

 

Thanks in advance for any help.