Created on 06-10-201611:35 PM - edited 08-17-201912:06 PM
A remote Linux system can use NFS (Network File System) to mount an HDFS file system and interact with the file system. Before proceeding, it's important to understand that your linux instance is directly accessing your HDFS system through the network, therefore you will incur network latency. Depending on your dataset size, you have to remember you could be potentially processing gigabytes or more of data on a single machine therefore this is not the best approach for large datasets.
These steps will show you how to mount and interact with a remote HDFS node within your Linux system:
1) The linux system must have NFS installed (CentOS for demo)
yum install nfs-utils nfs-utils-lib
2) Your HDP cluster must have an NFS Gateway installed (Ambari allows this option with one click)
* Keep track of either the FQDN or IP address of the NFSGateway
3) In Ambari, under HDFS > Advanced > General set Access time precision = 3600000
3) Mount the NFS Gateway on your linux system (must be root)
mount -t nfs -o vers=3,proto=tcp,nolock myipaddressorfqdnofnfsgateway:/ /opt/remotedirectory
4) On both your HDFS node & remote Linux system add the same user with the same uid (making sure neither already exist)
useradd -u 1234 testuser
* If your user/uid doesn't match between HDFS node and your remote Linux system - whatever uid you are logged in as on your remote Linux system will be passed and interpreted by the NFS Gateway. For example if your Linux system has usertest (uid = 501) and you write a file to HDFS's /tmp, the file owner of the file will be whichever user on the HDFS node matches uid=501 - therefore it is good practice to match both the username and the uid across both systems.
5) On your remote Linux system, login as your "testuser" and go-to your mounted NFS directory
cd /opt/remotedirectory
You will now be able to interact with HDFS with native linux command such as cp, less, more, etc:.