Created on 09-23-2018 01:47 PM - edited 09-16-2022 06:44 AM
I'm currently using Hortonworks 3.0.0.0-1634 (installed ~ 2 weeks ago). The system itself is great, but I can't seem to get libhdfs loaded into pyarrow. Which makes ingestion difficult.
The libhdfs0 package is installed on the systems, but when I try to actually find the .so file, it is a broken link:
root@use1-hadoop-5:~/compact# ls -larth /usr/hdp/3.0.0.0-1634/usr/lib/ total 8.0K lrwxrwxrwx 1 root root 16 Jul 12 21:06 libhdfs.so -> libhdfs.so.0.0.0 drwxr-xr-x 4 root root 4.0K Sep 21 19:00 .. drwxr-xr-x 2 root root 4.0K Sep 21 19:01 .
Am I missing something here?
Example failure:
root@use1-hadoop-5:~/compact# python3 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client" >>> os.environ["JAVA_HOME"] = "/usr/jdk64/jdk1.8.0_112/" >>> import subprocess >>> classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hdfs", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0] >>> os.environ["CLASSPATH"] = classpath.decode("utf-8") >>> import pyarrow as pa >>> fs = pa.hdfs.connect("use1-hadoop-namenode-1.datto.lan", 50070, user="hdfs") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/usr/local/lib/python3.5/dist-packages/pyarrow/hdfs.py", line 37, in __init__ self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs
Created on 03-13-2021 10:31 AM - edited 03-13-2021 10:36 AM
On my version (6.3.3)
It is found not in CDH/lib/hadoop/lib where it gets looked for, but out in
CDH/lib64 for some reason.
A symlink from hadoop/native out to lib64 would solve it.
cloudera/parcels/CDH/lib64/libhdfs.so