Created on 04-05-2018 04:41 PM - edited 08-17-2019 08:06 AM
Operational Challenges
Legacy applications that lack the ability to leverage the REST interface for HDFS can limit data access for storage.
One of the ways to deal with this is a mountable filesystem method for Hadoop. Additions to Linux kernel for native HDFS access are not yet there. Therefore we need to use the “user space” capabilities with something like FUSE to provide this functionality. https://en.wikipedia.org/wiki/Filesystem_in_Userspace
But before we begin, we need to understand some limitations of the HDFS-FUSE implementation:
So with those limitations understood, lets begin getting things setup on our cluster.
Installation Procedure
To begin with, we need to install the packages from the HDP repositories. This article is going to focus on HDP 2.6.4, but the same holds true for earlier releases (HDP 2.5.0 has been tested and works similarly) We are also going to assume that users have been added to the cluster and have access to both local directories as well as HDFS storage.
As root (or with elevated privileges), install the requisite packages.
[root@scratch ~]# yum install hadoop-hdfs-fuse
NOTE:
You may need to validate the paths for both “PATH” and “LD_LIBRARY_PATH” to the location for the requisite Hadoop & Java libraries and executables.
If using the Oracle JDK & HDP 2.6.4, they might look similar to this:
Now we need to create our mount point on the Linux filesystem that users can access.
This example uses a single active NameNode.
Addendum
Once you are satisfied with the location and permissions, you can have this mount at boot or run it as part of a secondary startup script (i.e. rc.local if it is enabled on CentOS/RHEL 7+) to mount on reboot. But it is best to wait until the NameNode is up and running before your proceed with this automation.
hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port> <mount_point> fuse allow_other,usetrash,rw 2 0
For more information on how this works, see the Apache Hadoop page for Mountable HDFS:
https://wiki.apache.org/hadoop/MountableHDFS
My next article on this will include how this can work with NameNode-HA and secured cluster with Kerberos (Authentication) and LDAP (Ranger Authorization).