Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

There are 2 different ways of accessing HDFS over http.

Using WebHDFS

http://<active-namenode-server>:<namenode-port>/webhdfs/v1/<file-path>?op=OPEN

Using HttpFs

http://<hadoop-httpfs-server>:<httpfs-port>/webhdfs/v1/<file-path>?op=OPEN

WebHDFS:

Pros:

  • Built-in with default Hadoop installation
  • Efficient as load is streamed from each data node

Cons:

  • Does not work if high availability is enabled on cluster, Active namenode needs to be specified to use webHdfs

HttpFs

Pros:

  • Works with HA enabled clusters.

Cons:

  • Needs to be installed as additional service.
  • Impacts performance because data is streamed from single node.
  • Creates single point of failure

Additional performance implications of webHDFS vs HttpFs

https://www.linkedin.com/today/post/article/20140717115238-176301000-accessing-hdfs-using-the-webhdf...

WebHDFS vs HttpFs Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a "gateway" and will be a single point of data transfer to the client node. So, HttpFs could be choked during a large file transfer but the good thing is that we are minimizing the footprint required to access HDFS.

13,120 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎02-19-2016 06:00 AM
Updated by:
 
Contributors
Top Kudoed Authors