Created on 09-19-2018 01:17 PM - edited 09-16-2022 08:50 AM
Hi, I am new and not sure if this is the right board to ask following questions.
We are planning to use our own implementation of zlib for default compression in Hadoop ( for performance reason ), and our host OS is Linux ( say Ubuntu ).
1. Which default libraries does Hadoop use? On our OS, /lib/x86_64-linux-gnu/libz.so.1.2.8 is installed? Does this mean Hadoop uses this one as default DEFLATE library?
2. If so, which Hadoop configuration files specify parameters for the DEFLATE algorithm ( and libz.so.1.28 )?
3. How does Hadoop uses zlib, i.e., which source files contain lines related to using zlib, or which Hadoop source files define AIPs to call zlib?
Thanks for any help.
Created 09-27-2018 07:30 AM
Hi,
You can use the command `hadoop checknative` to see which native libraries are being loaded. In the example below I check the native libraries and then change LD_LIBRARY_PATH and run the command again. In the second example you can see I'm loading libz from a different location.
[cloudera@quickstart ~]$ hadoop checknative
18/09/27 14:06:47 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/09/27 14:06:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
[cloudera@quickstart ~]$ cp /lib64/libz.so.1 lib/
[cloudera@quickstart ~]$ export LD_LIBRARY_PATH=$PWD/lib
[cloudera@quickstart ~]$ hadoop checknative
18/09/27 14:07:29 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
18/09/27 14:07:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /home/cloudera/lib/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
Hope this helps,
Jim
Created 01-11-2019 07:04 AM
Thanks, it does work now if I store files into hdfs.
However, if I do mapreduce, it seems hadoop is not following the native zlib path shown in
the results from "hadoop checknative", i.e., it does NOT use my library but still the system's library.
When of course, if I force hadoop use my library by
" ln -s -f /home/smjohn/lib/my_libz.so.1.1 libz.so.1"
then it does end up using my library
But that is not what I want, as I just need hadoop to use my zlib, but not other applications. I know I can always recompile from hadoop source, but that is not a solution, as my cluster has quite a lot of nodes, and they do have different environments.
So any suggestions of how to make hadoop/hive to use my zlib library for all mapreduce tasks?
Thanks in advance for any help.