Created on 10-02-2015 08:48 PM
httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster. This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.
The installation is a piece of cake:
yum install hadoop-httpfs
But that's were the fun ends!!! Configuring is a whole other thing. It's not hard, if you know the right buttons to push. Unfortunately, the buttons and directions for doing this can be quite aloof.
The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.
When you do the installation (above), a few items are installed.
/usr/hdp/2.2.x.x-x/hadoop-httpfs /etc/hadoop-httpfs/conf /etc/hadoop-httpfs/tomcat-deployment
Set the version for current with
hdp-select set hadoop-httpfs 2.2.x.x-x
From this point on, many of our changes are designed to "fix" the "hardcoded" implementations in the deployed scripts.
#!/bin/bash # Autodetect JAVA_HOME if not defined if [ -e /usr/libexec/bigtop-detect-javahome ]; then . /usr/libexec/bigtop-detect-javahome elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then . /usr/lib/bigtop-utils/bigtop-detect-javahome fi ### Added to assist with locating the right configuration directory export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf ### Remove the original HARD CODED Version reference... I mean, really??? export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client} export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec exec /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh.distro "$@"
Now let's create a few symlinks to connect the pieces together
cd /usr/hdp/current/hadoop-httpfs ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf ln -s ../hadoop/libexec libexec
Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment. Above, in the httpfs.sh script we set the location of the configuration directory. That is used to find and load the httpfs-env.sh file we'll modified below.
# Add these to control and set the Catalina directories for starting and finding the httpfs application export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment # Set a log directory that matches your standards export HTTPFS_LOG=/var/log/hadoop/httpfs # Set a tmp directory for httpfs to store interim files export HTTPFS_TEMP=/tmp/httpfs
That's it!! Now run it!
cd /usr/hdp/current/hadoop-httpfs/sbin ./httpfs.sh start # To Stop ./httpfs.sh stop
http://m1.hdp.local:14000/webhdfs/v1/user?user.name=hdfs&op=LISTSTATUS
Obviously, changing out with your target host. The default port is 14000. If you want to change that, add the following to:
export HTTPFS_HTTP_PORT=<new_port>
The HDP installation puts a set of init.d files in the specific versions directory.
cd /usr/hdp/<hdp.version>/etc/rc.d/init.d
Create a symlink to this in /etc/init.d
ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs
Then set up the service to run on restart
# As Root User chkconfig --add hadoop-httpfs
# Start Service service hadoop-httpfs start # Stop Service service hadoop-httpfs stop
This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).
A Little More Detail
Proxies are fun, aren't they? We'll they'll affect you here as well. The directions here mention these proxy settings in core-site.xml.
<property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property>This means that httpfs.sh must be run as the httpfs user, in order to work. If you want to run the service with another user, adjust the proxy settings above.
<property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property>
Created on 07-18-2016 06:33 AM
Hi, I am unable to start my Hue browser. Is that can be due to the httpfs?
Created on 08-29-2017 07:35 PM
If the cluster is Kerberized, use kadmin to create a principal and the keytab file. Once keytab is created, add one rule/line for this user in auth_to_local. Ensure httpfs proxyuser config is present and then make changes in the httpfs conf file to reflect the keytab.
Created on 09-13-2019 06:45 AM
Hi @dstreev
I wanted to use HTTPFS node in HDP3.0 to access HDFS, could you please let me know if there is any url or I need to install HTTPFS. If I need to install HTTPFS, are above mentioned steps are applicable for HDP3.0.
Best regards,
Created on 09-16-2019 12:26 AM
Very rewarding! Thanks
Created on 09-17-2019 11:02 AM
Hi @dstreev
Thanks for your article, I was checking and correct me if I'm wrong, but the same could be done using Knox service, that comes by default with HDP, it's that correct?
Or there is some extra feature with this service?
Regards
Gerard