Created on 09-23-2018 01:47 PM - edited 09-16-2022 06:44 AM
I'm currently using Hortonworks 3.0.0.0-1634 (installed ~ 2 weeks ago). The system itself is great, but I can't seem to get libhdfs loaded into pyarrow. Which makes ingestion difficult.
The libhdfs0 package is installed on the systems, but when I try to actually find the .so file, it is a broken link:
root@use1-hadoop-5:~/compact# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/ total 8.0K lrwxrwxrwx 1 root root 16 Jul 12 21:06 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x 4 root root 4.0K Sep 21 19:00 .. drwxr-xr-x 2 root root 4.0K Sep 21 19:01 .
Am I missing something here?
Example failure:
root@use1-hadoop-5:~/compact# python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client" >>> os.environ["JAVA_HOME"] = "/usr/jdk64/jdk1.8.0_112/" >>> import subprocess >>> classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hdfs", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0] >>> os.environ["CLASSPATH"] = classpath.decode("utf-8") >>> import pyarrow as pa >>> fs = pa.hdfs.connect("use1-hadoop-namenode-1.datto.lan", 50070, user="hdfs") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 37, in __init__ self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs
Created 09-23-2018 01:47 PM
As a note, literally copying libhdfs.so from a hadoop distro into the mentioned folder fixes this problem...
That is:
root@use1-hadoop-5:~/compact# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/ total 300K -rwxr-xr-x 1 datto datto 291K Aug 2 04:31 libhdfs.so.0.0.0 lrwxrwxrwx 1 datto datto 16 Aug 2 04:31 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x 4 root root 4.0K Sep 21 19:00 .. drwxr-xr-x 2 root root 4.0K Sep 21 19:30 .
Created 09-23-2018 01:47 PM
Turns out you can literally copy the file into place from a binary hadoop build and fix the error. Unfortunately...
After copying the file into place, I seem to get a new error:
>>> import os >>> os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client" >>> os.environ["JAVA_HOME"] = "/usr/jdk64/jdk1.8.0_112/" >>> import subprocess >>> classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hdfs", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0] >>> os.environ["CLASSPATH"] = classpath.decode("utf-8") >>> import pyarrow as pa >>> fs = pa.hdfs.connect("use1-hadoop-namenode-1.datto.lan", 50070, user="hdfs") 18/09/21 20:03:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/09/21 20:03:26 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. >>> fs.df() 18/09/21 20:03:34 WARN net.NetUtils: Unable to wrap exception of type class org.apache.hadoop.ipc.RpcException: it has no (String) constructor java.lang.NoSuchMethodException: org.apache.hadoop.ipc.RpcException.<init>(java.lang.String) at java.lang.Class.getConstructor0(Class.java:3082) at java.lang.Class.getConstructor(Class.java:1825) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:830) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1503) at org.apache.hadoop.ipc.Client.call(Client.java:1445) at org.apache.hadoop.ipc.Client.call(Client.java:1355) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy10.getFsStats(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getStats(ClientNamenodeProtocolTranslatorPB.java:705) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy11.getStats(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getStateByIndex(DFSClient.java:1921) at org.apache.hadoop.hdfs.DFSClient.getDiskStatus(DFSClient.java:1930) at org.apache.hadoop.hdfs.DistributedFileSystem.getStatus(DistributedFileSystem.java:1373) at org.apache.hadoop.fs.FileSystem.getStatus(FileSystem.java:2803) hdfsGetCapacity: FileSystem#getStatus error: RpcException: RPC response exceeds maximum data lengthjava.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length; Host Details : local host is: "us e1-hadoop-5/10.40.80.91"; destination host is: "use1-hadoop-namenode-1.datto.lan":50070; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:816) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1503) at org.apache.hadoop.ipc.Client.call(Client.java:1445) at org.apache.hadoop.ipc.Client.call(Client.java:1355) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy10.getFsStats(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getStats(ClientNamenodeProtocolTranslatorPB.java:705) ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/io-hdfs.pxi", line 194, in pyarrow.lib.HadoopFileSystem.df File "pyarrow/io-hdfs.pxi", line 170, in pyarrow.lib.HadoopFileSystem.get_capacity File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS GetCapacity failed, errno: 255 (Unknown error 255)
I currently have the IPC size set to around 1GB and still get this error.
Created 09-23-2018 03:06 PM
The HDF 3.0 installation itself comes with the "libhdfs.so.0.0.0" binary of the correct tested version. You do not need to separately download it from third party because it might cause conflict.
# ls -lart /usr/hdp/3.0.0.0-1634/usr/lib/ total 280 -rwxr-xr-x. 1 root root 286676 Jul 12 21:02 libhdfs.so.0.0.0 drwxr-xr-x. 4 root root 32 Jul 21 08:15 .. lrwxrwxrwx. 1 root root 16 Jul 21 08:15 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x. 2 root root 48 Jul 21 08:15 .
.
Recommendation will be to perform yum "reinstall" the specific package.
As we see that the "libhdfs.so.0.0.0" comes from the following repo/package.
# yum whatprovides '*libhdfs.so.0.0.0' hadoop_3_0_0_0_1634-libhdfs-3.1.0.3.0.0.0-1634.x86_64 : Hadoop Filesystem Library Repo : HDP-3.0-repo-51 Matched from: Filename : /usr/hdp/3.0.0.0-1634/usr/lib/libhdfs.so.0.0.0
Hence please try to reinstall that package instead and that will pull the missing file.
# yum reinstall "hadoop_3_0_0_0_1634-libhdfs-3.1.0.3.0.0.0-1634.x86_64"
Created 09-24-2018 03:48 PM
Appreciate the suggestion, but I did actually try that:
root@use1-hadoop-5:~/ingest_hive# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/ total 8.0K lrwxrwxrwx 1 datto datto 16 Aug 2 04:31 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x 4 root root 4.0K Sep 21 19:00 .. drwxr-xr-x 2 root root 4.0K Sep 24 14:06 . root@use1-hadoop-5:~/ingest_hive# dpkg -l | grep libhdfs ii libhdfs0 3.1.0.3.0.0.0-1634 all libhdfs0 is a virtual package that brings libhdfs0-3-0-0-0-1634 as a dependency. ii libhdfs0-3-0-0-0-1634 3.1.0.3.0.0.0-1634 amd64 Hadoop Filesystem Library root@use1-hadoop-5:~/ingest_hive# apt-get install --reinstall libhdfs0 libhdfs0-3-0-0-0-1634 Reading package lists... Done Building dependency tree Reading state information... Done 0 upgraded, 0 newly installed, 2 reinstalled, 0 to remove and 21 not upgraded. Need to get 2,416 B of archives. After this operation, 0 B of additional disk space will be used. Get:1 http://public-repo-1.hortonworks.com/HDP/ubuntu16/3.x/updates/3.0.0.0 HDP/main amd64 libhdfs0 all 3.1.0.3.0.0.0-1634 [1,006 B] Get:2 http://public-repo-1.hortonworks.com/HDP/ubuntu16/3.x/updates/3.0.0.0 HDP/main amd64 libhdfs0-3-0-0-0-1634 amd64 3.1.0.3.0.0.0-1634 [1,410 B] Fetched 2,416 B in 0s (15.1 kB/s) [master 70a37c9] saving uncommitted changes in /etc prior to apt run 3 files changed, 1 insertion(+), 1 deletion(-) rewrite hive/3.0.0.0-1634/0/hive-site.jceks (63%) rewrite oozie/3.0.0.0-1634/0/oozie-site.jceks (64%) (Reading database ... 152223 files and directories currently installed.) Preparing to unpack .../libhdfs0_3.1.0.3.0.0.0-1634_all.deb ... Unpacking libhdfs0 (3.1.0.3.0.0.0-1634) over (3.1.0.3.0.0.0-1634) ... Preparing to unpack .../libhdfs0-3-0-0-0-1634_3.1.0.3.0.0.0-1634_amd64.deb ... Unpacking libhdfs0-3-0-0-0-1634 (3.1.0.3.0.0.0-1634) over (3.1.0.3.0.0.0-1634) ... Setting up libhdfs0-3-0-0-0-1634 (3.1.0.3.0.0.0-1634) ... Setting up libhdfs0 (3.1.0.3.0.0.0-1634) ... root@use1-hadoop-5:~/ingest_hive# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/ total 8.0K lrwxrwxrwx 1 root root 16 Jul 12 21:06 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x 4 root root 4.0K Sep 21 19:00 .. drwxr-xr-x 2 root root 4.0K Sep 24 14:06 .
Created 02-06-2019 09:40 AM
As John described the package seems to only provide a broken symlink. Any other hints on this topic would be highly appreciated @Jay Kumar SenSharma
It seems that the issue still persists for John as well, as you can see in the follow up question: https://community.hortonworks.com/questions/232464/libhdfs-problems.html
Created 01-09-2019 01:36 PM
Any updates on this? libhdfs.so.0.0.0 is missing on 3.0.1.0-187 as well
Created 01-21-2021 06:52 AM
I have the same problem on HDP 3.1.4.0-315
What is the solution ?
Created 01-22-2021 05:39 AM
I have opened a new message for the same topic => https://community.cloudera.com/t5/Support-Questions/HDP-3-1-4-0-315-libhdfs-problem/td-p/310294
Created 02-02-2021 10:49 PM
@john_seekins : did you found a solution ?