Support Questions

Find answers, ask questions, and share your expertise

libhdfs missing

avatar

I'm currently using Hortonworks 3.0.0.0-1634 (installed ~ 2 weeks ago). The system itself is great, but I can't seem to get libhdfs loaded into pyarrow. Which makes ingestion difficult.

The libhdfs0 package is installed on the systems, but when I try to actually find the .so file, it is a broken link:

root@use1-hadoop-5:~/compact# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/
total 8.0K
lrwxrwxrwx 1 root root   16 Jul 12 21:06 libhdfs.so -> libhdfs.so.0.0.0
drwxr-xr-x 4 root root 4.0K Sep 21 19:00 ..
drwxr-xr-x 2 root root 4.0K Sep 21 19:01 .

Am I missing something here?

Example failure:

root@use1-hadoop-5:~/compact# python3 
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client"
>>> os.environ["JAVA_HOME"] = "/usr/jdk64/jdk1.8.0_112/"
>>> import subprocess
>>> classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hdfs", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
>>> os.environ["CLASSPATH"] = classpath.decode("utf-8")
>>> import pyarrow as pa
>>> fs = pa.hdfs.connect("use1-hadoop-namenode-1.datto.lan", 50070, user="hdfs")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 183, in connect
    extra_conf=extra_conf)
  File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libhdfs

10 REPLIES 10

avatar

As a note, literally copying libhdfs.so from a hadoop distro into the mentioned folder fixes this problem...

That is:

  1. download the binary tarball from http://apache.claz.org/hadoop/common/hadoop-3.1.1/
  2. untar
  3. rsync -aP <folder>/lib/native/libhdfs.so* use1-hadoop-5:/usr/hdp/3.0.0.0-1634/usr/lib/
  4. profit!
root@use1-hadoop-5:~/compact# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/
total 300K
-rwxr-xr-x 1 datto datto 291K Aug  2 04:31 libhdfs.so.0.0.0
lrwxrwxrwx 1 datto datto   16 Aug  2 04:31 libhdfs.so -> libhdfs.so.0.0.0
drwxr-xr-x 4 root  root  4.0K Sep 21 19:00 ..
drwxr-xr-x 2 root  root  4.0K Sep 21 19:30 .

avatar

Turns out you can literally copy the file into place from a binary hadoop build and fix the error. Unfortunately...

After copying the file into place, I seem to get a new error:

>>> import os
>>> os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client"
>>> os.environ["JAVA_HOME"] = "/usr/jdk64/jdk1.8.0_112/"
>>> import subprocess
>>> classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hdfs", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
>>> os.environ["CLASSPATH"] = classpath.decode("utf-8")
>>> import pyarrow as pa                                                                               
>>> fs = pa.hdfs.connect("use1-hadoop-namenode-1.datto.lan", 50070, user="hdfs")
18/09/21 20:03:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/21 20:03:26 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
>>> fs.df()
18/09/21 20:03:34 WARN net.NetUtils: Unable to wrap exception of type class org.apache.hadoop.ipc.RpcException: it has no (String) constructor
java.lang.NoSuchMethodException: org.apache.hadoop.ipc.RpcException.<init>(java.lang.String)
        at java.lang.Class.getConstructor0(Class.java:3082)
        at java.lang.Class.getConstructor(Class.java:1825)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:830)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1503)
        at org.apache.hadoop.ipc.Client.call(Client.java:1445)
        at org.apache.hadoop.ipc.Client.call(Client.java:1355)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy10.getFsStats(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getStats(ClientNamenodeProtocolTranslatorPB.java:705)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy11.getStats(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getStateByIndex(DFSClient.java:1921)
        at org.apache.hadoop.hdfs.DFSClient.getDiskStatus(DFSClient.java:1930)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getStatus(DistributedFileSystem.java:1373)
        at org.apache.hadoop.fs.FileSystem.getStatus(FileSystem.java:2803)
hdfsGetCapacity: FileSystem#getStatus error:
RpcException: RPC response exceeds maximum data lengthjava.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; Host Details : local host is: "us
e1-hadoop-5/10.40.80.91"; destination host is: "use1-hadoop-namenode-1.datto.lan":50070; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1503)
        at org.apache.hadoop.ipc.Client.call(Client.java:1445)
        at org.apache.hadoop.ipc.Client.call(Client.java:1355)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy10.getFsStats(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getStats(ClientNamenodeProtocolTranslatorPB.java:705)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/io-hdfs.pxi", line 194, in pyarrow.lib.HadoopFileSystem.df
  File "pyarrow/io-hdfs.pxi", line 170, in pyarrow.lib.HadoopFileSystem.get_capacity
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: HDFS GetCapacity failed, errno: 255 (Unknown error 255)

I currently have the IPC size set to around 1GB and still get this error.

avatar
Master Mentor

@John Seekins

The HDF 3.0 installation itself comes with the "libhdfs.so.0.0.0" binary of the correct tested version. You do not need to separately download it from third party because it might cause conflict.

# ls -lart  /usr/hdp/3.0.0.0-1634/usr/lib/
total 280
-rwxr-xr-x. 1 root root 286676 Jul 12 21:02 libhdfs.so.0.0.0
drwxr-xr-x. 4 root root     32 Jul 21 08:15 ..
lrwxrwxrwx. 1 root root     16 Jul 21 08:15 libhdfs.so -> libhdfs.so.0.0.0
drwxr-xr-x. 2 root root     48 Jul 21 08:15 .

.

Recommendation will be to perform yum "reinstall" the specific package.

As we see that the "libhdfs.so.0.0.0" comes from the following repo/package.

# yum whatprovides '*libhdfs.so.0.0.0'
hadoop_3_0_0_0_1634-libhdfs-3.1.0.3.0.0.0-1634.x86_64 : Hadoop Filesystem Library
Repo  : HDP-3.0-repo-51
Matched from:
Filename  : /usr/hdp/3.0.0.0-1634/usr/lib/libhdfs.so.0.0.0



Hence please try to reinstall that package instead and that will pull the missing file.

# yum reinstall "hadoop_3_0_0_0_1634-libhdfs-3.1.0.3.0.0.0-1634.x86_64"

avatar

Appreciate the suggestion, but I did actually try that:

root@use1-hadoop-5:~/ingest_hive# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/
total 8.0K
lrwxrwxrwx 1 datto datto   16 Aug  2 04:31 libhdfs.so -> libhdfs.so.0.0.0
drwxr-xr-x 4 root  root  4.0K Sep 21 19:00 ..
drwxr-xr-x 2 root  root  4.0K Sep 24 14:06 .
root@use1-hadoop-5:~/ingest_hive# dpkg -l | grep libhdfs
ii  libhdfs0                                        3.1.0.3.0.0.0-1634                         all          libhdfs0 is a virtual package that brings libhdfs0-3-0-0-0-1634 as a dependency.
ii  libhdfs0-3-0-0-0-1634                           3.1.0.3.0.0.0-1634                         amd64        Hadoop Filesystem Library
root@use1-hadoop-5:~/ingest_hive# apt-get install --reinstall libhdfs0 libhdfs0-3-0-0-0-1634
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 2 reinstalled, 0 to remove and 21 not upgraded.
Need to get 2,416 B of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://public-repo-1.hortonworks.com/HDP/ubuntu16/3.x/updates/3.0.0.0 HDP/main amd64 libhdfs0 all 3.1.0.3.0.0.0-1634 [1,006 B]
Get:2 http://public-repo-1.hortonworks.com/HDP/ubuntu16/3.x/updates/3.0.0.0 HDP/main amd64 libhdfs0-3-0-0-0-1634 amd64 3.1.0.3.0.0.0-1634 [1,410 B]
Fetched 2,416 B in 0s (15.1 kB/s)                 
[master 70a37c9] saving uncommitted changes in /etc prior to apt run
 3 files changed, 1 insertion(+), 1 deletion(-)
 rewrite hive/3.0.0.0-1634/0/hive-site.jceks (63%)
 rewrite oozie/3.0.0.0-1634/0/oozie-site.jceks (64%)
(Reading database ... 152223 files and directories currently installed.)
Preparing to unpack .../libhdfs0_3.1.0.3.0.0.0-1634_all.deb ...
Unpacking libhdfs0 (3.1.0.3.0.0.0-1634) over (3.1.0.3.0.0.0-1634) ...
Preparing to unpack .../libhdfs0-3-0-0-0-1634_3.1.0.3.0.0.0-1634_amd64.deb ...
Unpacking libhdfs0-3-0-0-0-1634 (3.1.0.3.0.0.0-1634) over (3.1.0.3.0.0.0-1634) ...
Setting up libhdfs0-3-0-0-0-1634 (3.1.0.3.0.0.0-1634) ...
Setting up libhdfs0 (3.1.0.3.0.0.0-1634) ...
root@use1-hadoop-5:~/ingest_hive# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/
total 8.0K
lrwxrwxrwx 1 root root   16 Jul 12 21:06 libhdfs.so -> libhdfs.so.0.0.0
drwxr-xr-x 4 root root 4.0K Sep 21 19:00 ..
drwxr-xr-x 2 root root 4.0K Sep 24 14:06 .

avatar
Contributor

As John described the package seems to only provide a broken symlink. Any other hints on this topic would be highly appreciated @Jay Kumar SenSharma

It seems that the issue still persists for John as well, as you can see in the follow up question: https://community.hortonworks.com/questions/232464/libhdfs-problems.html

avatar

Any updates on this? libhdfs.so.0.0.0 is missing on 3.0.1.0-187 as well

avatar
Contributor

I have the same problem on HDP 3.1.4.0-315

What is the solution ?

avatar
Contributor

avatar
Contributor

@john_seekins : did you found a solution ?