Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster. This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.

The installation is a piece of cake:

yum install hadoop-httpfs

But that's were the fun ends!!! Configuring is a whole other thing. It's not hard, if you know the right buttons to push. Unfortunately, the buttons and directions for doing this can be quite aloof.

The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.

When you do the installation (above), a few items are installed.

/usr/hdp/2.2.x.x-x/hadoop-httpfs
/etc/hadoop-httpfs/conf
/etc/hadoop-httpfs/tomcat-deployment

Configuring - The Short Version

Set the version for current with

hdp-select set hadoop-httpfs 2.2.x.x-x

From this point on, many of our changes are designed to "fix" the "hardcoded" implementations in the deployed scripts.

Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script

#!/bin/bash
# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
  . /usr/libexec/bigtop-detect-javahome
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
  . /usr/lib/bigtop-utils/bigtop-detect-javahome
fi
### Added to assist with locating the right configuration directory
export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf
### Remove the original HARD CODED Version reference...  I mean, really???
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec

exec /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh.distro "$@"

Now let's create a few symlinks to connect the pieces together

cd /usr/hdp/current/hadoop-httpfs
ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf
ln -s ../hadoop/libexec libexec
 

Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment. Above, in the httpfs.sh script we set the location of the configuration directory. That is used to find and load the httpfs-env.sh file we'll modified below.

# Add these to control and set the Catalina directories for starting and finding the httpfs application
export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs
export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment
 
# Set a log directory that matches your standards
export HTTPFS_LOG=/var/log/hadoop/httpfs
 
# Set a tmp directory for httpfs to store interim files
export HTTPFS_TEMP=/tmp/httpfs

That's it!! Now run it!

cd /usr/hdp/current/hadoop-httpfs/sbin
./httpfs.sh start
 
# To Stop
./httpfs.sh stop 

Try it out!!

http://m1.hdp.local:14000/webhdfs/v1/user?user.name=hdfs&op=LISTSTATUS

Obviously, changing out with your target host. The default port is 14000. If you want to change that, add the following to:

export HTTPFS_HTTP_PORT=<new_port>

Want to Add httpfs as a Service (auto-start)?

The HDP installation puts a set of init.d files in the specific versions directory.

cd /usr/hdp/<hdp.version>/etc/rc.d/init.d

Create a symlink to this in /etc/init.d

ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs

Then set up the service to run on restart

# As Root User
chkconfig --add hadoop-httpfs
# Start Service
service hadoop-httpfs start
 
# Stop Service
service hadoop-httpfs stop

This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).

A Little More Detail

Proxies are fun, aren't they? We'll they'll affect you here as well. The directions here mention these proxy settings in core-site.xml.

<property>
 <name>hadoop.proxyuser.httpfs.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.httpfs.hosts</name>
 <value>*</value>
</property>
This means that httpfs.sh must be run as the httpfs user, in order to work. If you want to run the service with another user, adjust the proxy settings above.
 <property>
 <name>hadoop.proxyuser.root.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.root.hosts</name>
 <value>*</value>
</property>
14,478 Views
Comments
avatar
Explorer

Hi, I am unable to start my Hue browser. Is that can be due to the httpfs?

avatar
Explorer

If the cluster is Kerberized, use kadmin to create a principal and the keytab file. Once keytab is created, add one rule/line for this user in auth_to_local. Ensure httpfs proxyuser config is present and then make changes in the httpfs conf file to reflect the keytab.

avatar
Contributor

Hi @dstreev 

 

I wanted to use HTTPFS node in HDP3.0 to access HDFS, could you please let me know if there is any url  or I need to install HTTPFS. If I need to install HTTPFS, are above mentioned steps are applicable for HDP3.0.

 

Best regards,

avatar
New Contributor

Very rewarding! Thanks

avatar

Hi @dstreev 

Thanks for your article, I was checking and correct me if I'm wrong, but the same could be done using Knox service, that comes by default with HDP, it's that correct?

Or there is some extra feature with this service?

Regards
Gerard